29.16 Creating list-columns
Typically, you won’t create list-columns with tibble()
. Instead, you’ll create them from regular columns, using one of three methods:
- With
tidyr::nest()
to convert a grouped data frame into a nested data frame where you have list-column of data frames. - With
mutate()
and vectorised functions that return a list. - With
summarise()
and summary functions that return multiple results.
Alternatively, you might create them from a named list, using
tibble::enframe()
When creating list-columns, make sure they are homogeneous.
29.16.1 With nesting
nest()
creates a nested data frame, meaning, each row is a meta-observation.When applied to a group data frame,
nest()
keeps the grouping columns as is.
- You can also use it on an un-grouped data frame, specifying which columns you want to nest.
29.16.2 From vectorised functions
- If you use
stringr::str_split() + mutate()
you get a list-column.
Again, note where the quotes are placed.
- And now
unnest()
knows how to handle these list of vectors.
If you find yourself using this pattern a lot, make sure to check out
tidyr::separate_rows()
which is a wrapper around this common pattern.
Another example uses map(),
map2(), and pmap()
. We could re-write Invoking different functions and rewrite it to use mutate()
.
Previous Example Code:
sim <- tribble(
~f, ~params,
"runif", list(min = -1, max = 1),
"rnorm", list(sd = 5),
"rpois", list(lambda = 10)
)
sim %>%
mutate(sim = invoke_map(f, params, n = 10))
Refactored Code using mutate()
sim <- tribble(
~f, ~params,
"runif", list(min = -1, max = 1),
"rnorm", list(sd = 5),
"rpois", list(lambda = 10)
)
sim %>%
mutate(sims = invoke_map(f, params, n = 10))
I donm’t understand what is being expressed here…..the two code snippets are identical, except in chapter 25, the name is
sims
instead. Thoughts?
29.16.3 From multivalued summaries
One restriction of summarise()
is it only works with summary functions that return a single value. Implying, you can’t use it with functions like quantile()
that return a vector of arbitrary length.
You can however, wrap the result in a list! This obeys the contract of summarise(), because each summary is now a list (a vector) of length 1.
To make useful results with unnest90
, you’ll also ned to capture probabilities.
29.16.4 From a named list
What do you do if you want to iterate over both the contents of a list and its elements?
- Make a data frame with one column containing the elements and another column containing the list!
You can use tibble::enframe()
.
Now, if we want to iterate over names and values in parrallel, we can use map2()
.