13.7 Augmented vectors
- Atomics vectors and lists are building blocks for other important vector types like factors and dates.
- These are called augmented vectors because they are vectors with additional attributes, including class.
- Because augmented vectors have a class, they behave differently to the atomic vector on which they are built.
- These are called augmented vectors because they are vectors with additional attributes, including class.
- Here, we will discuss four important augmented vectors:
- Factors
- Dates
- Date-times
- Tibbles
13.7.1 Factors
- They are designed to represent categorical data that can take a fixed set of possible values.
- They are built on top of integers and have a levels attribute:
13.7.2 Dates and date-times
- In R, dates are numeric vectors that represent the number of days since 1 January 1970.
- Dates-times are numeric vectors with class
POSIXct
that represent the # of seconds since 1 January 1970.
Note: “POSIXct” stands for “Portable Operating System Interface”, calendar time.
x_date_time <- lubridate::ymd_hm("1970-01-01 01:00")
unclass(x_date_time)
typeof(x_date_time)
attributes(x_date_time)
The tzone
attribute is optional–> controls how the time is printed, not what absolute time it refers to.
attr(x_date_time, "tzone") <- "US/Pacific"
x_date_time
attr(x_date_time, "tzone") <- "US/Eastern"
x_date_time
Qn: how to find the other time zones?
- Another type of date-times called POSIXlt which are built on top of named lists:
POSIXlts are rare inside the tidyverse. But pop up in base R, because they are needed to extract specific components of a date, like the year or month.
Since lubridate provides helpers for us to do this instead, we don’t need them.
POSIXct’s are always easier to work with, so if we have a POSIXlt, we should always convert it to a regular data time lubridate::as_date_time()
.
13.7.3 Tibbles
- Tibbles are augmented lists.
- They have class “tbl_df” + “tbl” + “data.frame”, and names (column) and row.names attributes:
- The difference between a tibble and a list:
- All the elements of a data frame must be vectors with the same length.
- All functions that work with tibbles enforce this constraint.
- All the elements of a data frame must be vectors with the same length.
- Traditional data.frames have a very similar structure:
The main difference is the class.
The class of tibble includes “data.frame” which means tibbles inherit the regular data frame behaviour by default.
13.7.4 Exercises
- What does hms::hms(3600) return? How does it print? What primitive type is the augmented vector built on top of? What attributes does it use?
hms::hms
returns an object of class, and prints the time in “%H:%M:%S” format.
The attributes it uses are units and class
- Try and make a tibble that has columns with different lengths. What happens?
The “scalar” 1 is recycled to the length of the longer vector.
Creating a tibble with two vectors of different lengths will give an error.
- Based on the definition above, is it ok to have a list as a column of a tibble?
Tibbles can have atomic vectors (with additional attributes)of different types: doubles, character, integers, logical, factor, date. Hence, they can have a list vector as long as its the same length!