3.6 Lists

  • sometimes called a generic vector or recursive vector
  • Recall (section 2.3.3): each element is really a reference to another object
  • an be composed of elements of different types (as opposed to atomic vectors which must be of only one type)

3.6.1 Constructing

Simple lists:

# Construct
simple_list <- list(
  c(TRUE, FALSE),   # logicals
  1:20,             # integers
  c(1.2, 2.3, 3.4), # doubles
  c("primo", "secundo", "tercio") # characters
)

simple_list
#> [[1]]
#> [1]  TRUE FALSE
#> 
#> [[2]]
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
#> 
#> [[3]]
#> [1] 1.2 2.3 3.4
#> 
#> [[4]]
#> [1] "primo"   "secundo" "tercio"

# Inspect
# - type
typeof(simple_list)
#> [1] "list"
# - structure
str(simple_list)
#> List of 4
#>  $ : logi [1:2] TRUE FALSE
#>  $ : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ : num [1:3] 1.2 2.3 3.4
#>  $ : chr [1:3] "primo" "secundo" "tercio"

# Accessing
simple_list[1]
#> [[1]]
#> [1]  TRUE FALSE
simple_list[2]
#> [[1]]
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
simple_list[3]
#> [[1]]
#> [1] 1.2 2.3 3.4
simple_list[4]
#> [[1]]
#> [1] "primo"   "secundo" "tercio"

simple_list[[1]][2]
#> [1] FALSE
simple_list[[2]][8]
#> [1] 8
simple_list[[3]][2]
#> [1] 2.3
simple_list[[4]][3]
#> [1] "tercio"

Even Simpler List

# Construct
simpler_list <- list(TRUE, FALSE, 
                    1, 2, 3, 4, 5, 
                    1.2, 2.3, 3.4, 
                    "primo", "secundo", "tercio")

# Accessing
simpler_list[1]
#> [[1]]
#> [1] TRUE
simpler_list[5]
#> [[1]]
#> [1] 3
simpler_list[9]
#> [[1]]
#> [1] 2.3
simpler_list[11]
#> [[1]]
#> [1] "primo"

Nested lists:

nested_list <- list(
  # first level
  list(
    # second level
    list(
      # third level
      list(1)
    )
  )
)

str(nested_list)
#> List of 1
#>  $ :List of 1
#>   ..$ :List of 1
#>   .. ..$ :List of 1
#>   .. .. ..$ : num 1

Like JSON.

Combined lists

# with list()
list_comb1 <- list(list(1, 2), list(3, 4))
# with c()
list_comb2 <- c(list(1, 2), list(3, 4))

# compare structure
str(list_comb1)
#> List of 2
#>  $ :List of 2
#>   ..$ : num 1
#>   ..$ : num 2
#>  $ :List of 2
#>   ..$ : num 3
#>   ..$ : num 4
str(list_comb2)
#> List of 4
#>  $ : num 1
#>  $ : num 2
#>  $ : num 3
#>  $ : num 4

# does this work if they are different data types?
list_comb3 <- c(list(1, 2), list(TRUE, FALSE))
str(list_comb3)
#> List of 4
#>  $ : num 1
#>  $ : num 2
#>  $ : logi TRUE
#>  $ : logi FALSE

3.6.2 Testing

Check that is a list:

  • is.list()
  • `rlang::is_list()``

The two do the same, except that the latter can check for the number of elements

# is list
base::is.list(list_comb2)
#> [1] TRUE
rlang::is_list(list_comb2)
#> [1] TRUE

# is list of 4 elements
rlang::is_list(x = list_comb2, n = 4)
#> [1] TRUE

# is a vector (of a special type)
# remember the family tree?
rlang::is_vector(list_comb2)
#> [1] TRUE

3.6.3 Coercion

Use as.list()

list(1:3)
#> [[1]]
#> [1] 1 2 3
as.list(1:3)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3

3.6.4 Matrices and arrays

Although not often used, the dimension attribute can be added to create list-matrices or list-arrays.

l <- list(1:3, "a", TRUE, 1.0)
dim(l) <- c(2, 2)
l
#>      [,1]      [,2]
#> [1,] integer,3 TRUE
#> [2,] "a"       1

l[[1, 1]]
#> [1] 1 2 3

3.6.5 Exercises

  1. List all the ways that a list differs from an atomic vector.
Answer(s)
  • Atomic vectors are always homogeneous (all elements must be of the same type). Lists may be heterogeneous (the elements can be of different types) as described in the introduction of the vectors chapter.
  • Atomic vectors point to one address in memory, while lists contain a separate reference for each element. (This was described in the list sections of the vectors and the names and values chapters.)
lobstr::ref(1:2)
#> [1:0x7fcd936f6e80] <int>
lobstr::ref(list(1:2, 2))
#> █ [1:0x7fcd93d53048] <list> 
#> ├─[2:0x7fcd91377e40] <int> 
#> └─[3:0x7fcd93b41eb0] <dbl>
  • Subsetting with out-of-bounds and NA values leads to different output. For example, [ returns NA for atomics and NULL for lists. (This is described in more detail within the subsetting chapter.)
# Subsetting atomic vectors
(1:2)[3]
#> [1] NA
(1:2)[NA]
#> [1] NA NA

# Subsetting lists
as.list(1:2)[3]
#> [[1]]
#> NULL
as.list(1:2)[NA]
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
  1. Why do you need to use unlist() to convert a list to an atomic vector? Why doesn’t as.vector() work?
Answer(s)

A list is already a vector, though not an atomic one! Note that as.vector() and is.vector() use different definitions of “vector!”

is.vector(as.vector(mtcars))
#> [1] FALSE
  1. Compare and contrast c() and unlist() when combining a date and date-time into a single vector.
Answer(s)

Date and date-time objects are both built upon doubles. While dates store the number of days since the reference date 1970-01-01 (also known as “the Epoch”) in days, date-time-objects (POSIXct) store the time difference to this date in seconds.

date    <- as.Date("1970-01-02")
dttm_ct <- as.POSIXct("1970-01-01 01:00", tz = "UTC")

# Internal representations
unclass(date)
#> [1] 1
unclass(dttm_ct)
#> [1] 3600
#> attr(,"tzone")
#> [1] "UTC"

As the c() generic only dispatches on its first argument, combining date and date-time objects via c() could lead to surprising results in older R versions (pre R 4.0.0):

# Output in R version 3.6.2
c(date, dttm_ct)  # equal to c.Date(date, dttm_ct) 
#> [1] "1970-01-02" "1979-11-10"
c(dttm_ct, date)  # equal to c.POSIXct(date, dttm_ct)
#> [1] "1970-01-01 02:00:00 CET" "1970-01-01 01:00:01 CET"

In the first statement above c.Date() is executed, which incorrectly treats the underlying double of dttm_ct (3600) as days instead of seconds. Conversely, when c.POSIXct() is called on a date, one day is counted as one second only.

We can highlight these mechanics by the following code:

# Output in R version 3.6.2
unclass(c(date, dttm_ct))  # internal representation
#> [1] 1 3600
date + 3599
#> "1979-11-10"

As of R 4.0.0 these issues have been resolved and both methods now convert their input first into POSIXct and Date, respectively.

c(dttm_ct, date)
#> [1] "1970-01-01 01:00:00 UTC" "1970-01-02 00:00:00 UTC"
unclass(c(dttm_ct, date))
#> [1]  3600 86400

c(date, dttm_ct)
#> [1] "1970-01-02" "1970-01-01"
unclass(c(date, dttm_ct))
#> [1] 1 0

However, as c() strips the time zone (and other attributes) of POSIXct objects, some caution is still recommended.

(dttm_ct <- as.POSIXct("1970-01-01 01:00", tz = "HST"))
#> [1] "1970-01-01 01:00:00 HST"
attributes(c(dttm_ct))
#> $class
#> [1] "POSIXct" "POSIXt"

A package that deals with these kinds of problems in more depth and provides a structural solution for them is the {vctrs} package9 which is also used throughout the tidyverse.10

Let’s look at unlist(), which operates on list input.

# Attributes are stripped
unlist(list(date, dttm_ct))  
#> [1]     1 39600

We see again that dates and date-times are internally stored as doubles. Unfortunately, this is all we are left with, when unlist strips the attributes of the list.

To summarise: c() coerces types and strips time zones. Errors may have occurred in older R versions because of inappropriate method dispatch/immature methods. unlist() strips attributes.