29.26 List-columns

  • List-Columns are implicit in the definition of the data frame: a data frame is a named list of equal length vectors.
  • Base-R doesn’t make it easy to create list-columns, and data.frame() treats a list as a list of columns.
data.frame(x = list(1:3, 3:5))
  • You can prevent data.frame() from treating a lists of lists by adding I() to the argument. However, this doesn’t print well.

I() stands for Inhibit Interpretation/Conversion of Objects: Change the class of an object to indicate that is should be treated as is.

data.frame(
  x = I(list(1:3, 3:5)),
  y = c("1, 2", "3, 4, 5")
)
  • Tibble alleviates this problem by being lazier (tibble() doesn’t modify its inputs) and by providing a better print method.

Note where the quotes are placed

tibble(
  x = list(1:3, 3:5),
  y = c("1, 2", "3, 4, 5")
)
  • tribble() can automatically work out that you need a list.
tribble(
   ~x, ~y,
  1:3, "1, 2",
  3:5, "3, 4, 5"
)
  • List-columns are often most useful as intermediate data structure.
  • Advantage of keeping related items together in a data frame is worth a little hassle.

There are three parts of an effective list-column pipeline:

  1. You create the list-column using one of: nest(), summarise() + list(), or mutate() + a map function
  2. You create other intermediate list-columns by transforming existing list columns with map(), map2(), or pmap().
  3. You simplify the list-column back down to a data frame or atomic vector.