Advanced RChapter 3 - VectorsOrry Messer 4 R4DS Reading Group, Cohort 3@orrymr16/08/20201 / 27

Introduction

Vectors are the most important family of data types in base R
Vectors come in two (delicious) flavours:

Atomic Vectors
All elements must have the same type

Lists
elements can have different types

NULL? - Not a vector (but closely related - serves role of generic zero length vector, but we will get to that)

Attributes (named list of arbitrary metadata). Two particularly important attributes:
- dimension (turns vectors into matrices and arrays)
- class (powers S3)
factors, dates, times, data frames and tibbles are all S3 objects!

2 / 27

Outline

3.2 Atomic Vectors
3.3 Attributes
3.4 S3 Atomic Vectors
3.5 Lists
3.6 Data Frames and Tibbles
3.7 NULL

3 / 27

Atomic Vectors

Four primary types of atomic vectors:
- logical
- integer
- double
- character

Two rares:
- complex
- raw

lgl_var <- c(TRUE, FALSE)
int_var <- c(1L, 6L, 10L)
dbl_var <- c(1, 2.5, 4.5)
chr_var <- c('these are', "some strings")

4 Atomics: all elements have the same types. typeof() to determine type... of.

4 / 27

NA's, Testing and Coercion

NA's (which R uses for missing values) are infectious.
Test vectors of given type by using is.*() - for example, is.integer()
For atomic vectors, need same type across the entire vector.
- So, when combining different types, coerced in a fixed order: character -> double -> integer -> logical

c(TRUE)

## [1] TRUE

c(TRUE, 42L)

## [1]  1 42

c(TRUE, 42L, 3.14)

## [1]  1.00 42.00  3.14

c(TRUE, 42L, 3.14, "elephant")

## [1] "TRUE"     "42"       "3.14"     "elephant"

5 / 27

NA vs NULL?

NULL
- Has unique type (NULL)
- Length 0
- Can't have attributes
- Used for representing empty vector
- Represent absent vector (such as in a function argument)
NA
- NA indicated element of vector is absent
- Confusingly, SQL NULL is equivalent R's NA

6 / 27

Attributes

Name-value pairs that attach metadata to an object
Get/Set individual attributes with attr(), thusly:

a <- 1:3
attr(a, "x") <- "abcdef"
attr(a, "x")

## [1] "abcdef"

Get/Set en masse with attributes()/structure(), respectively:

a <- structure(
  1:3, 
  x = "abcdef",
  y = "why?"
)
attributes(a)

## $x
## [1] "abcdef"
## 
## $y
## [1] "why?"

7 / 27

Attributes (Generally) Ephemeral (1)

Using the variables a defined in the last slide..

attributes(a)

## $x
## [1] "abcdef"
## 
## $y
## [1] "why?"

attributes(a[1])

## NULL

attributes(sum(a))

## NULL

8 / 27

Attributes (Generally) Ephemeral (2)Only 2 attributesd routinely preserved:names, which is itself a character vector giving each element a name
dim, which is itself an integer vector, used to turn vectors into matrices/arrays.

To preserve other attributes, need to create your own S3 class
9 / 27

names()

3 ways to name a vector:

# When creating it: 
x <- c(a = 1, b = 2, c = 3)
# By assigning a character vector to names()
x <- 1:3
names(x) <- c("a", "b", "c")
# Inline, with setNames():
x <- setNames(1:3, c("a", "b", "c"))

10 / 27

dim()

Adding a dim attribute to a vector allows it to behave like a 2-dimensional matrix or a multi-dimensional array.

# Two scalar arguments specify row and column sizes
a <- matrix(1:6, nrow = 2, ncol = 3)
dim(a)

## [1] 2 3

b <- array(1:12, c(2, 3, 2))
dim(b)

## [1] 2 3 2

c <- 1:6
dim(c) <- c(3,2)

A vector without a dim attribute set is often thought of as 1-dimensional, but actually has NULL dimensions.
You also can have matrices with a single row or single column, or arrays with a single dimension.

11 / 27

S3 Atomic Vectors

Having a class attribute turns an object into an S3 object
Means it will behave differently from regular vector when passed into generic function
4 important S3 vectors in base R
- factor
- Date
- POSIXct
- difftime

12 / 27

Factors (1)

Used to store categorical data
Can only contained predefined values
built on top of integer vector, with two attributes: class = "factor" and levels which define allowed values.

x <- factor(c("a", "b", "b", "a"))
x

## [1] a b b a
## Levels: a b

typeof(x)

## [1] "integer"

attributes(x)

## $levels
## [1] "a" "b"
## 
## $class
## [1] "factor"

13 / 27

Factors (2)

Ordered factors - order is meaningful

grade <- ordered(c("b", "b", "a", "c"), levels = c("c", "b", "a"))
grade

## [1] b b a c
## Levels: c < b < a

14 / 27

Dates

Built on top of double vectors
Have class = "Date". No other attributes.

the_day_this_slide_was_rendered <- Sys.Date()
the_day_this_slide_was_rendered

## [1] "2020-08-20"

typeof(the_day_this_slide_was_rendered)

## [1] "double"

attributes(the_day_this_slide_was_rendered)

## $class
## [1] "Date"

unclass(the_day_this_slide_was_rendered) # Days since 1970-01-01

## [1] 18494

15 / 27

Date-times (1)

Like dates, also built on double vectors
- 2 ways: POSIXct vs POSIClt
- We'll focus on POSIXct

then_ct <- as.POSIXct("2018-08-01 22:00", tz = "UTC")
then_ct

## [1] "2018-08-01 22:00:00 UTC"

typeof(then_ct) # Let's not forget, it was built on a double vector

## [1] "double"

attributes(then_ct)

## $class
## [1] "POSIXct" "POSIXt" 
## 
## $tzone
## [1] "UTC"

16 / 27

Date-timess (2)tzone attribute controls how date-time is formatted
why multiple classes? 
17 / 27

Durations

Represent amount of time between dates/date-times
Built on top of doubles
Have units attribute to determine how integer should be interpreted

one_week_1 <- as.difftime(1, units = "weeks")
one_week_1

## Time difference of 1 weeks

attributes(one_week_1)

## $class
## [1] "difftime"
## 
## $units
## [1] "weeks"

one_week_2 <- as.difftime(7, units = "days")
one_week_2

## Time difference of 7 days

attributes(one_week_2)

## $class
## [1] "difftime"
## 
## $units
## [1] "days"

18 / 27

Lists (1)

Each element can be any type

Although technically, each element is the same type, because it's just a reference (Section 2.3.3)
Because made up of references, total size may be smaller than you expect:

lobstr::obj_size(mtcars)

## 7,208 B

l2 <- list(mtcars, mtcars, mtcars, mtcars)
lobstr::obj_size(l2)

## 7,288 B

19 / 27

Lists (2)

Recursive

l3 <- list(list(list(1)))

l4 <- list(list(1, 2), c(3, 4))
str(l4)

## List of 2
##  $ :List of 2
##   ..$ : num 1
##   ..$ : num 2
##  $ : num [1:2] 3 4

20 / 27

Lists (3)

l5 <- c(list(1, 2), c(3, 4)) # If given a combination of atomic vector and list, c() will coerce vectors to lists before comibining them
str(l5) #NB, it's a list, even though we called c()

## List of 4
##  $ : num 1
##  $ : num 2
##  $ : num 3
##  $ : num 4

l6 <- c(c(1, 2), c(3, 4))
str(l6) # Still an atomic vector...

##  num [1:4] 1 2 3 4

typeof() list is list.
is.list() - test for list
coerce to list with as.list()
list-matrices and list-arrays exist. (Remember, we previously created arrays/matrices from atomic vectors)

21 / 27

Data frames and tibbles

Data frames and tibbles are lists of vectors
They are S3 vectors (see the "class" attribute)

df1 <- data.frame(x = 1:3, y = letters[1:3])
attributes(df1)

## $names
## [1] "x" "y"
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3

22 / 27

Tibbles (1)

Frustration with data frames led to tibbles

df2 <- tibble(x = 1:3, y = letters[1:3]) # still a list of vectors
attributes(df2)

## $names
## [1] "x" "y"
## 
## $row.names
## [1] 1 2 3
## 
## $class
## [1] "tbl_df"     "tbl"        "data.frame"

23 / 27

Tibbles (2)

Lazy and surly
Lazy
- Don't coerce input (which is why you need stringsAsFactors = FALSE for data frames)
- Don't automatically convert non-syntactic names:

names(data.frame(`1` = 1))

## [1] "X1"

names(tibble(`1` = 1))

## [1] "1"

tibbles do not support row names
tibbles have a nicer print method
subsetting: [ always returns tibble & $ doesn't do partial matching

24 / 27

List Columns (1)

Data frames support list columns, but need I():

df <- data.frame(x = 1:3)
df$y <- list(1:2, 1:3, 1:4)
data.frame(
  x = 1:3, 
  y = I(list(1:2, 1:3, 1:4))
)

##   x          y
## 1 1       1, 2
## 2 2    1, 2, 3
## 3 3 1, 2, 3, 4

25 / 27

List Columns (2)

Easier with tibbles:

tibble(
  x = 1:3, 
  y = list(1:2, 1:3, 1:4)
)

## # A tibble: 3 x 2
##       x y        
##   <int> <list>   
## 1     1 <int [2]>
## 2     2 <int [3]>
## 3     3 <int [4]>

Can also have matrix / array / data frame columns

26 / 27

27 / 27

Introduction

Vectors are the most important family of data types in base R

Vectors come in two (delicious) flavours:

Atomic Vectors
All elements must have the same type

Lists
elements can have different types

NULL? - Not a vector (but closely related - serves role of generic zero length vector, but we will get to that)

Attributes (named list of arbitrary metadata). Two particularly important attributes:

dimension (turns vectors into matrices and arrays)
class (powers S3)

factors, dates, times, data frames and tibbles are all S3 objects!

2 / 27

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help