13.7 Augmented vectors

  • Atomics vectors and lists are building blocks for other important vector types like factors and dates.
    • These are called augmented vectors because they are vectors with additional attributes, including class.
      • Because augmented vectors have a class, they behave differently to the atomic vector on which they are built.
  • Here, we will discuss four important augmented vectors:
    • Factors
    • Dates
    • Date-times
    • Tibbles

13.7.1 Factors

  • They are designed to represent categorical data that can take a fixed set of possible values.
  • They are built on top of integers and have a levels attribute:
x_factor <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))
typeof(x_factor)
attributes(x_factor)

13.7.2 Dates and date-times

  • In R, dates are numeric vectors that represent the number of days since 1 January 1970.
x_date <- as.Date("1971-01-01")
unclass(x_date)

typeof(x_date)

attributes(x_date)
  • Dates-times are numeric vectors with class POSIXct that represent the # of seconds since 1 January 1970.

Note: “POSIXct” stands for “Portable Operating System Interface”, calendar time.

x_date_time <- lubridate::ymd_hm("1970-01-01 01:00")
unclass(x_date_time)

typeof(x_date_time)

attributes(x_date_time)

The tzone attribute is optional–> controls how the time is printed, not what absolute time it refers to.

attr(x_date_time, "tzone") <- "US/Pacific"
x_date_time

attr(x_date_time, "tzone") <- "US/Eastern"
x_date_time

Qn: how to find the other time zones?

  • Another type of date-times called POSIXlt which are built on top of named lists:
y_date_time <- as.POSIXlt(x_date_time)
typeof(y_date_time)
attributes(y_date_time)

POSIXlts are rare inside the tidyverse. But pop up in base R, because they are needed to extract specific components of a date, like the year or month.

Since lubridate provides helpers for us to do this instead, we don’t need them.

POSIXct’s are always easier to work with, so if we have a POSIXlt, we should always convert it to a regular data time lubridate::as_date_time().

13.7.3 Tibbles

  • Tibbles are augmented lists.
    • They have class “tbl_df” + “tbl” + “data.frame”, and names (column) and row.names attributes:
tb <- tibble::tibble(x = 1:5, y = 5:1)
typeof(tb)
attributes(tb)
  • The difference between a tibble and a list:
    • All the elements of a data frame must be vectors with the same length.
      • All functions that work with tibbles enforce this constraint.
  • Traditional data.frames have a very similar structure:
df <- data.frame(x = 1:5, y = 5:1)
typeof(df)
attributes(df)

The main difference is the class.

The class of tibble includes “data.frame” which means tibbles inherit the regular data frame behaviour by default.

13.7.4 Exercises

  1. What does hms::hms(3600) return? How does it print? What primitive type is the augmented vector built on top of? What attributes does it use?
(x <- hms::hms(3600))
class(x)

attributes(x)

hms::hms returns an object of class, and prints the time in “%H:%M:%S” format.

The attributes it uses are units and class

  1. Try and make a tibble that has columns with different lengths. What happens?
tibble(x = 1, y = 1:5)

The “scalar” 1 is recycled to the length of the longer vector.

tibble(x = 1:3, y = 1:4)

Creating a tibble with two vectors of different lengths will give an error.

  1. Based on the definition above, is it ok to have a list as a column of a tibble?
tibble(x = 1:3, y = list("a", 1, list(1:3)))

Tibbles can have atomic vectors (with additional attributes)of different types: doubles, character, integers, logical, factor, date. Hence, they can have a list vector as long as its the same length!