Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R by Hadley Wickham

Chapter 3: Vectors

Tony ElHabr

@TonyElHabr

2020-04-16

1 / 17

What's in Chapter 3

2 / 17

What's in Chapter 3



  • Section 3.2: atomic vectors

  • Section 3.3: attributes

  • Section 3.4: "special" vectors (S3 atomic vectors)

  • Section 3.5: lists

  • Section 3.6: data frames and tibbles

  • Section 3.7: NULL

2 / 17

What's in Chapter 3



  • Section 3.2: atomic vectors

  • Section 3.3: attributes

  • Section 3.4: "special" vectors (S3 atomic vectors)

  • Section 3.5: lists

  • Section 3.6: data frames and tibbles

  • Section 3.7: NULL

2 / 17

Vectors

3 / 17

Vectors

  • 2 types: atomic and list
The difference is that elements of atomic vectors have the same "type", while the elements of list vectors can be of different types. In general, the word "atomic" describes things that are irreducible units or components of a system. Here, the irreducible units are the data types, and the system is the R programming language. So we can see why its appropriate to call vectors of a single type "atomic". Lists have their own special properties.

atomic

list

3 / 17

Vectors

  • 2 types: atomic and list
The difference is that elements of atomic vectors have the same "type", while the elements of list vectors can be of different types. In general, the word "atomic" describes things that are irreducible units or components of a system. Here, the irreducible units are the data types, and the system is the R programming language. So we can see why its appropriate to call vectors of a single type "atomic". Lists have their own special properties.

atomic

list

... and there is also NULL

Bringing NULL into the picture, we have our complete set of vector types.

3 / 17

Atomic Vectors

4 / 17

Atomic Vectors

  • 4 primary types: logical, integer, double, character (i.e. strings)

c(TRUE, FALSE, T, F)
c(1234L, 42L)
c(3.14, .314e1, 0xbada55)
c('single quote', "double quote")
I think everyone is probably familiar with these types, so I won't spend too much time on them. When we create individual instances of these "types", we have what we call "scalars". In other programming languages, such a form---a single value of a single type---may have different properties. But with R, everything is a vector, even "scalars".
4 / 17

Atomic Vectors

  • 4 primary types: logical, integer, double, character (i.e. strings)

c(TRUE, FALSE, T, F)
c(1234L, 42L)
c(3.14, .314e1, 0xbada55)
c('single quote', "double quote")
I think everyone is probably familiar with these types, so I won't spend too much time on them. When we create individual instances of these "types", we have what we call "scalars". In other programming languages, such a form---a single value of a single type---may have different properties. But with R, everything is a vector, even "scalars".

... also raw and complex

raw(42)
complex(real = 0, imaginary = -1)
4 / 17

Atomic Vectors

  • 4 primary types: logical, integer, double, character (i.e. strings)

c(TRUE, FALSE, T, F)
c(1234L, 42L)
c(3.14, .314e1, 0xbada55)
c('single quote', "double quote")
I think everyone is probably familiar with these types, so I won't spend too much time on them. When we create individual instances of these "types", we have what we call "scalars". In other programming languages, such a form---a single value of a single type---may have different properties. But with R, everything is a vector, even "scalars".

... also raw and complex

raw(42)
complex(real = 0, imaginary = -1)
  • Check type with typeof()
Use typeof() to identify the type of a variable.
4 / 17

Coercion

5 / 17

Coercion

  • Coercion happens when you attempt to combine vectors with elements of different types
Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to numeric.
5 / 17

Coercion

  • Coercion happens when you attempt to combine vectors with elements of different types
Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to numeric.
  • Coercion order: character → double → integer → logical
c(1, 1.01) # to double
## [1] 1.00 1.01
c(1, '1') # to character
## [1] "1" "1"
c(1, TRUE) # to integer
## [1] 1 1
5 / 17

Coercion

  • Coercion happens when you attempt to combine vectors with elements of different types
Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to numeric.
  • Coercion order: character → double → integer → logical
c(1, 1.01) # to double
## [1] 1.00 1.01
c(1, '1') # to character
## [1] "1" "1"
c(1, TRUE) # to integer
## [1] 1 1
  • Explicity coerce with as.*() functions
as.integer(c(1, 1.01))
## [1] 1 1
5 / 17

Coercion

  • Coercion happens when you attempt to combine vectors with elements of different types
Coercion often happens automatically. Most mathematical functions (+, log, abs, etc.) will coerce to numeric.
  • Coercion order: character → double → integer → logical
c(1, 1.01) # to double
## [1] 1.00 1.01
c(1, '1') # to character
## [1] "1" "1"
c(1, TRUE) # to integer
## [1] 1 1
  • Explicity coerce with as.*() functions
as.integer(c(1, 1.01))
## [1] 1 1
  • Failed coercion leads to warnings and NA
as.integer(c('1', '1.01', 'a'))
## Warning: NAs introduced by coercion
## [1] 1 1 NA
5 / 17

NA and NULL

6 / 17

NA and NULL

  • NA is a "sentinel" value for explicit missingness

  • NA can be of any type, e.g. NA_integer_, NA_character_, etc.

Default is logical, which is achieved by just NA.
  • Calculations involving NAs usually result in more NAs
1 + NA
## [1] NA

...although not always

1 | NA
## [1] TRUE
  • Test with is.na()
6 / 17

NA and NULL

  • NA is a "sentinel" value for explicit missingness

  • NA can be of any type, e.g. NA_integer_, NA_character_, etc.

Default is logical, which is achieved by just NA.
  • Calculations involving NAs usually result in more NAs
1 + NA
## [1] NA

...although not always

1 | NA
## [1] TRUE
  • Test with is.na()
  • NULL is its own vector type
typeof(NULL)
## [1] "NULL"
  • Zero-length
length(NULL)
## [1] 0
  • Cannot have attributes
x <- NULL
attr(x, 'y') <- 1 # error
A vector with NA can have attributes.
  • Test with is.null()
Speaking of attributes...
6 / 17

Attributes

7 / 17

Attributes

  • Name-value pairs of metadata for R objects
7 / 17

Attributes

  • Name-value pairs of metadata for R objects

  • Get and set a single attribute with attr()

x <- 'a'
attr(x, 'what') <- 'apple'
attr(x, 'what')
## [1] "apple"
7 / 17

Attributes

  • Name-value pairs of metadata for R objects

  • Get and set a single attribute with attr()

x <- 'a'
attr(x, 'what') <- 'apple'
attr(x, 'what')
## [1] "apple"
  • Get and set multiple attributes with attributes() and structure()
7 / 17

Attributes

  • Name-value pairs of metadata for R objects

  • Get and set a single attribute with attr()

x <- 'a'
attr(x, 'what') <- 'apple'
attr(x, 'what')
## [1] "apple"
  • Get and set multiple attributes with attributes() and structure()
x <- structure('a', what = 'apple', type = 'fruit')
attributes(x)
## $what
## [1] "apple"
##
## $type
## [1] "fruit"
7 / 17

Attributes

  • Name-value pairs of metadata for R objects

  • Get and set a single attribute with attr()

x <- 'a'
attr(x, 'what') <- 'apple'
attr(x, 'what')
## [1] "apple"
  • Get and set multiple attributes with attributes() and structure()
x <- structure('a', what = 'apple', type = 'fruit')
attributes(x)
## $what
## [1] "apple"
##
## $type
## [1] "fruit"
  • With the exception of names() and dim(), most attributes are lost with calculations
attributes(x[1])
## NULL
7 / 17

names()

8 / 17

names()

  • names() can be assigned in multiple ways
x <- c(apple = 'a', banana = 'b') # 1
x
y <- c('a', 'b')
names(y) <- c('apple', 'banana') # 2
y
setNames(y, c('apple', 'banana')) # 3
## apple banana
## "a" "b"
## apple banana
## "a" "b"
## apple banana
## "a" "b"
8 / 17

dim()

9 / 17

dim()

  • dim() has the capability of turning a 1-d vector into a 2-d matrix or an n-d array
a <- matrix(1:6, nrow = 2, ncol = 3)
a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
b <- array(1:6, dim = c(1, 3, 2))
b
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 2 3
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 4 5 6
9 / 17

dim()

  • dim() has the capability of turning a 1-d vector into a 2-d matrix or an n-d array
a <- matrix(1:6, nrow = 2, ncol = 3)
a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
b <- array(1:6, dim = c(1, 3, 2))
b
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 2 3
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 4 5 6
  • Weird things
9 / 17

dim()

  • dim() has the capability of turning a 1-d vector into a 2-d matrix or an n-d array
a <- matrix(1:6, nrow = 2, ncol = 3)
a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
b <- array(1:6, dim = c(1, 3, 2))
b
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 2 3
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 4 5 6
  • Weird things

    • 1-d vector without a dim attribute has NULL dimension
9 / 17

dim()

  • dim() has the capability of turning a 1-d vector into a 2-d matrix or an n-d array
a <- matrix(1:6, nrow = 2, ncol = 3)
a
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
b <- array(1:6, dim = c(1, 3, 2))
b
## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 2 3
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 4 5 6
  • Weird things

    • 1-d vector without a dim attribute has NULL dimension

    • Matrices and arrays can be a single column or row vector

9 / 17

S3 atomic vectors

10 / 17

S3 atomic vectors

  • Objects with a class attribute, making them S3 objects
This means that the vector will behave different with a generic function such as pr() or even mathematical operators such as +
10 / 17

S3 atomic vectors

  • Objects with a class attribute, making them S3 objects
This means that the vector will behave different with a generic function such as pr() or even mathematical operators such as +
  • 4 important S3 vector types in base R: factor (categorical), Date (Date), POSIXct (date-time), duration (difftime).

10 / 17

Factors

11 / 17

Factors

  • Vector that can only contain pre-defined values
Thus, it is commonly used to store categorical data.
11 / 17

Factors

  • Vector that can only contain pre-defined values
Thus, it is commonly used to store categorical data.
  • Has two attributes: class and levels
11 / 17

Factors

  • Vector that can only contain pre-defined values
Thus, it is commonly used to store categorical data.
  • Has two attributes: class and levels

  • Built on top of integers, not characters

fruits <- factor(c('banana', 'apple', 'carrot'))
fruits
## [1] banana apple carrot
## Levels: apple banana carrot
Note that it is ordered alphabetically by default.
11 / 17

Factors

  • Vector that can only contain pre-defined values
Thus, it is commonly used to store categorical data.
  • Has two attributes: class and levels

  • Built on top of integers, not characters

fruits <- factor(c('banana', 'apple', 'carrot'))
fruits
## [1] banana apple carrot
## Levels: apple banana carrot
Note that it is ordered alphabetically by default.
  • Variation: ordered factors
x <- ordered(c('two', 'three', 'one'), levels = c('one', 'two', 'three'))
x
## [1] two three one
## Levels: one < two < three
[Interesting story about strgsAsFacrs](https://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/)
11 / 17

Date, POSIXct, and duration

12 / 17

Date, POSIXct, and duration

  • All built on top of doubles
12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

  • Date-times are trickier...

12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

  • Date-times are trickier...

    • Represent seconds since Jan. 1, 1970
12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

  • Date-times are trickier...

    • Represent seconds since Jan. 1, 1970

    • POSIXct isn't the only possible class; there's also POSIXlt

12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

  • Date-times are trickier...

    • Represent seconds since Jan. 1, 1970

    • POSIXct isn't the only possible class; there's also POSIXlt

    • Also have a "parent" class of POSIXt

12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

  • Date-times are trickier...

    • Represent seconds since Jan. 1, 1970

    • POSIXct isn't the only possible class; there's also POSIXlt

    • Also have a "parent" class of POSIXt

    • Have a tzone attribute

c is for "calendar" time and "l" is for local time
12 / 17

Date, POSIXct, and duration

  • All built on top of doubles

  • Dates have class = "Date"

  • Date-times are trickier...

    • Represent seconds since Jan. 1, 1970

    • POSIXct isn't the only possible class; there's also POSIXlt

    • Also have a "parent" class of POSIXt

    • Have a tzone attribute

c is for "calendar" time and "l" is for local time
  • Durations have 2 attributes: class = "difftime" and units corresponding to a temporal unit, e.g. "day"
This temporal unit is purely for pretty printing.
12 / 17

Lists

13 / 17

Lists

  • Each element can by of any atomic type, or even another list

Lists are sometimes called "recursive" vectors because a list can contain other lists. Atomic vectors don't have the same capability
13 / 17

Lists

  • Each element can by of any atomic type, or even another list

Lists are sometimes called "recursive" vectors because a list can contain other lists. Atomic vectors don't have the same capability
  • Each element is really a reference
As shown in the last chapter, lists are references. (Pointer may be a more familiar term.) This means that the size of a list does not really scale as you might expect.)
x <- 1L
lobstr::obj_size(x)
## 56 B
lobstr::obj_size(rep(x, 3L))
## 64 B
13 / 17

Lists

  • Each element can by of any atomic type, or even another list

Lists are sometimes called "recursive" vectors because a list can contain other lists. Atomic vectors don't have the same capability
  • Each element is really a reference
As shown in the last chapter, lists are references. (Pointer may be a more familiar term.) This means that the size of a list does not really scale as you might expect.)
x <- 1L
lobstr::obj_size(x)
## 56 B
lobstr::obj_size(rep(x, 3L))
## 64 B
  • Combining with c is different than wrapping with list()
This follows from the capability of lists to store other lists.
x <- list(a = 1, b = 2)
y <- list(c = -1, d = -2)
length(list(x, y))
## [1] 2
length(c(x, y))
## [1] 4
13 / 17

Lists

  • Each element can by of any atomic type, or even another list

Lists are sometimes called "recursive" vectors because a list can contain other lists. Atomic vectors don't have the same capability
  • Each element is really a reference
As shown in the last chapter, lists are references. (Pointer may be a more familiar term.) This means that the size of a list does not really scale as you might expect.)
x <- 1L
lobstr::obj_size(x)
## 56 B
lobstr::obj_size(rep(x, 3L))
## 64 B
  • Combining with c is different than wrapping with list()
This follows from the capability of lists to store other lists.
x <- list(a = 1, b = 2)
y <- list(c = -1, d = -2)
length(list(x, y))
## [1] 2
length(c(x, y))
## [1] 4
+ Coercing to a list may not result in what you'd expect ras.list(ers[1:2]) ##[[1]]##[1]a####[[2]]##[1]b
13 / 17

Data frames

14 / 17

Data frames

  • S3 vectors built on top of lists

14 / 17

Data frames

  • S3 vectors built on top of lists

df <- data.frame(col1 = 1:2, col2 = c('a', 'b'))
df
## col1 col2
## 1 1 a
## 2 2 b
14 / 17

Data frames

  • S3 vectors built on top of lists

df <- data.frame(col1 = 1:2, col2 = c('a', 'b'))
df
## col1 col2
## 1 1 a
## 2 2 b
  • Data frames have some undesireable default behavior
class(df$col2)
## [1] "character"
14 / 17

Data frames

  • S3 vectors built on top of lists

df <- data.frame(col1 = 1:2, col2 = c('a', 'b'))
df
## col1 col2
## 1 1 a
## 2 2 b
  • Data frames have some undesireable default behavior
class(df$col2)
## [1] "character"

... which spawned tibbles (with the {tibble} package)

tbl <- tibble::tibble(col1 = 1:2, col2 = c('a', 'b'))
class(tbl$col2)
## [1] "character"
14 / 17

Data frame vs tibble behavior

15 / 17

Data frame vs tibble behavior

  • Tibble don't coerce strings to factors by default
15 / 17

Data frame vs tibble behavior

  • Tibble don't coerce strings to factors by default

  • Tibbles discourage rownames, which are generally "bad"

Rownames are "bad" because: (1) storing metadata in a different way than the rest of the data is generally not a good idea; (2) only work if a row can be identified by a single string; (3) must be unique.
15 / 17

Data frame vs tibble behavior

  • Tibble don't coerce strings to factors by default

  • Tibbles discourage rownames, which are generally "bad"

Rownames are "bad" because: (1) storing metadata in a different way than the rest of the data is generally not a good idea; (2) only work if a row can be identified by a single string; (3) must be unique.
  • Tibbles have a "prettier" print method
15 / 17

Data frame vs tibble behavior

  • Tibble don't coerce strings to factors by default

  • Tibbles discourage rownames, which are generally "bad"

Rownames are "bad" because: (1) storing metadata in a different way than the rest of the data is generally not a good idea; (2) only work if a row can be identified by a single string; (3) must be unique.
  • Tibbles have a "prettier" print method

  • Tibbles have stricter subsetting rules

tibbles always return a tibble and don't allow partial matching with $
15 / 17

Non-your-typical column

16 / 17

Non-your-typical column

  • Data frame columns can be lists
Need to wrap with I() here. (I is for identity.)
data.frame(x = 1:2, y = I(list(1:3, 1:4)))
## x y
## 1 1 1, 2, 3
## 2 2 1, 2, 3, 4
16 / 17

Non-your-typical column

  • Data frame columns can be lists
Need to wrap with I() here. (I is for identity.)
data.frame(x = 1:2, y = I(list(1:3, 1:4)))
## x y
## 1 1 1, 2, 3
## 2 2 1, 2, 3, 4
  • Easier list-column creation with tibbles
tibble::tibble(x = 1:2, y = list(1:3, 1:4))
## # A tibble: 2 x 2
## x y
## <int> <list>
## 1 1 <int [3]>
## 2 2 <int [4]>
16 / 17

Non-your-typical column

  • Data frame columns can be lists
Need to wrap with I() here. (I is for identity.)
data.frame(x = 1:2, y = I(list(1:3, 1:4)))
## x y
## 1 1 1, 2, 3
## 2 2 1, 2, 3, 4
  • Easier list-column creation with tibbles
tibble::tibble(x = 1:2, y = list(1:3, 1:4))
## # A tibble: 2 x 2
## x y
## <int> <list>
## 1 1 <int [3]>
## 2 2 <int [4]>
  • Columns can even be matrices and data frames
Notably, don't need to wrap with I() here.
data.frame(x = 1:2, y = matrix(3:6, nrow = 2))
data.frame(x = 1:2, y = data.frame(a = 3:4, b = 5:6))
16 / 17

In Review

17 / 17

In Review

17 / 17

What's in Chapter 3

2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow