29.27 Creating list-columns

Typically, you won’t create list-columns with tibble(). Instead, you’ll create them from regular columns, using one of three methods:

  1. With tidyr::nest() to convert a grouped data frame into a nested data frame where you have list-column of data frames.
  2. With mutate() and vectorised functions that return a list.
  3. With summarise() and summary functions that return multiple results.

Alternatively, you might create them from a named list, using tibble::enframe()

When creating list-columns, make sure they are homogeneous.

29.27.1 With nesting

  • nest() creates a nested data frame, meaning, each row is a meta-observation.

  • When applied to a group data frame, nest() keeps the grouping columns as is.

gapminder %>%
  group_by(country, continent) %>%
  nest()
  • You can also use it on an un-grouped data frame, specifying which columns you want to nest.
gapminder %>%
  nest(data = c(year:gdpPercap))

29.27.2 From vectorised functions

  • If you use stringr::str_split() + mutate() you get a list-column.

Again, note where the quotes are placed.

df <- tribble(
  ~x1,
  "a,b,c",
  "d,e,f,g"
)

df %>%
  mutate(x2 = stringr::str_split(x1, ","))
  • And now unnest() knows how to handle these list of vectors.
df %>%
  mutate(x2 = stringr::str_split(x1, ",")) %>%
  unnest(x2)

If you find yourself using this pattern a lot, make sure to check out tidyr::separate_rows() which is a wrapper around this common pattern.

Another example uses map(),map2(), and pmap(). We could re-write Invoking different functions and rewrite it to use mutate().

Previous Example Code:

sim <- tribble(
  ~f,      ~params,
  "runif", list(min = -1, max = 1),
  "rnorm", list(sd = 5),
  "rpois", list(lambda = 10)
)
sim %>%
  mutate(sim = invoke_map(f, params, n = 10))

Refactored Code using mutate()

sim <- tribble(
  ~f,      ~params,
  "runif", list(min = -1, max = 1),
  "rnorm", list(sd = 5),
  "rpois", list(lambda = 10)
)

sim %>%
  mutate(sims = invoke_map(f, params, n = 10))

I donm’t understand what is being expressed here…..the two code snippets are identical, except in chapter 25, the name is sims instead. Thoughts?

29.27.3 From multivalued summaries

One restriction of summarise() is it only works with summary functions that return a single value. Implying, you can’t use it with functions like quantile() that return a vector of arbitrary length.

mtcars %>%
  group_by(cyl) %>%
  summarise(q = quantile(mpg))

You can however, wrap the result in a list! This obeys the contract of summarise(), because each summary is now a list (a vector) of length 1.

mtcars %>%
  group_by(cyl) %>%
  summarise(q = list(quantile(mpg)))

To make useful results with unnest90, you’ll also ned to capture probabilities.

probs <- c(0.01, 0.25, 0.5, 0.75, 0.99)
mtcars %>%
  group_by(cyl) %>%
  summarise(p = list(probs), q = list(quantile(mpg, probs))) %>%
  unnest(c(p, q))

29.27.4 From a named list

What do you do if you want to iterate over both the contents of a list and its elements?

  • Make a data frame with one column containing the elements and another column containing the list!

You can use tibble::enframe().

x <- list(
  a = 1:5,
  b = 3:4,
  c = 5:6
)

df <- enframe(x)
df

Now, if we want to iterate over names and values in parrallel, we can use map2().

df %>%
  mutate(
    smry = map2_chr(name, value, ~ stringr::str_c(.x, ": ", .y[1]))
  )