Advanced RChapter 2: Names and ValuesJosh Pohlkamp-Hartt@JPohlkampHartt2021-02-061 / 18

Outline

The distinction between names and values
When R makes a copy, and how to track them
How much memory an object actually occupies
Exceptions to copy-on-modify
The garbage collector

Prerequisites

The lobstr package is used to understand the internal representation of R objects.

library(lobstr)

2 / 18

Binding basics

Consider this simple example, what is happening?

x <- c(1, 2, 3)

Are we creating x and it has values 1,2,3?

3 / 18

Binding basics

Consider this simple example, what is happening?

x <- c(1, 2, 3)

Are we creating x and it has values 1,2,3?

Not really, it’s more accurate to say that this code is:

Creating an object, a vector of values, c(1, 2, 3)
binding that object to a name, x

4 / 18

Binding basics cont.

In fact, you can think of a name as a reference to a value.

For example, in the following code, we don't copy of the vector c(1, 2, 3), we get another binding to the existing object:

y <- x

We can use lobstr::obj_addr() to see these object's identifiers:

obj_addr(x)

## [1] "0x7fa0c0336e28"

obj_addr(y)

## [1] "0x7fa0c0336e28"

5 / 18

Copy-on-modify

What happens to x when we modify y ?

y[[3]] <- 4

6 / 18

Copy-on-modify

What happens to x when we modify y ?

y[[3]] <- 4

## [1] 1 2 3

Changing y did not modify x.

This is due to a behavior called copy-on-modify.

obj_addr(x)

## [1] "0x7fa0c0336e28"

obj_addr(y)

## [1] "0x7fa0c41eca08"

7 / 18

Copy-on-modify cont.

We can use tracemem() to track when an object gets copied. It does so by printing the address of the object every time it is copied.

x <- c(1, 2, 3)
cat(tracemem(x), "\n")

## <0x7fa0be68bcd8>

y <- x
y[[3]] <- 4L

## tracemem[0x7fa0be68bcd8 -> 0x7fa0bd53a8e8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous>

If we modify y again, we will not make another copy. That’s because the new object now only has a single name bound to it, y, so R applies modify-in-place optimisation.

y[[3]] <- 5L
untracemem(x)

untracemem() is the opposite of tracemem(); it turns tracing off.

8 / 18

Copy-on-modify: Lists

It’s not just names that point to values; elements of lists do too.

Consider this list, which is appears similar to the vector above.

l1 <- list(1, 2, 3)

This list is more complex because instead of storing the values itself, it stores references to them.

9 / 18

Copy-on-modify: Lists

It’s not just names that point to values; elements of lists do too.

Consider this list, which is appears similar to the vector above.

l1 <- list(1, 2, 3)

This list is more complex because instead of storing the values itself, it stores references to them.

This is particularly important when we modify a list:

l2 <- l1

10 / 18

Copy-on-modify: Lists cont.

When modifications are made, the list object and its bindings are copied, but the values pointed to by the bindings are not.

l2[[3]] <- 4

To see values' addresses that are shared across lists, use lobstr::ref().

ref(l1, l2)

## █ [1:0x7fa0bd6a38a8] <list> 
## ├─[2:0x7fa0bd669fc0] <dbl> 
## ├─[3:0x7fa0bd669f88] <dbl> 
## └─[4:0x7fa0bd669f50] <dbl> 
##  
## █ [5:0x7fa0bd7f4ac8] <list> 
## ├─[2:0x7fa0bd669fc0] 
## ├─[3:0x7fa0bd669f88] 
## └─[6:0x7fa0bd785bd0] <dbl>

11 / 18

Copy-on-modify: Data Frames

Data frames are lists of vectors, so copy-on-modify has important consequences.

d1 <- data.frame(x = c(1, 5, 6),
                 y = c(2, 4, 3))

12 / 18

Copy-on-modify: Data Frames

Data frames are lists of vectors, so copy-on-modify has important consequences.

If we modify a column, only that column needs to be modified; the others will still point to their original references:

d2 <- d1
d2[, 2] <- d2[, 2] * 2

13 / 18

Copy-on-modify: Data Frames

Data frames are lists of vectors, so copy-on-modify has important consequences.

If we modify a row, every column is modified, which means every column must be copied:

d3 <- d1
d3[1, ] <- d3[1, ] * 3

14 / 18

Copy-on-modify: Character Vectors

R actually uses a global string pool where each element of a character vector is a pointer to a unique string in the pool:

x <- c("a", "a", "abc", "d")
ref(x, character = T)

## █ [1:0x7fa0bd6a3ad8] <chr> 
## ├─[2:0x7fa0bcac6130] <string: "a"> 
## ├─[2:0x7fa0bcac6130] 
## ├─[3:0x7fa0bd459d58] <string: "abc"> 
## └─[4:0x7fa0bf1ab2e8] <string: "d">

This has implications for how much memory a character vector uses. To find out, we use lobstr::obj_size().

obj_size(x)

## 248 B

obj_size("d")

## 112 B

x<-c(x,"d")
obj_size(x)

## 264 B

15 / 18

obj_size $\neq$ object.size

When checking object size, lobstr::obj_size() will provide a more accurate result.

This is commented on in the documentation for utils::object.size()

"...it should be reasonably accurate for atomic vectors, but does not detect if elements of a list are shared."

--- ?object.size

y <- rep(list(runif(1e4)), 100)

obj_size(y)

## 80,896 B

object.size(y)

## 8005648 bytes

16 / 18

Modify-in-place

Modyfing an R object usually creates a copy. The exceptions are:

objects with a single binding (as shown earlier)

Environments, a special type of object, are always modified in place

Implication: we can create functions that “remember” their previous state.

e1 <- rlang::env(a = 1, b = 2, c = 3)
e2 <- e1

e1$c <- 4

17 / 18

Unbinding and the garbage collector

x <- 1:3

x <- 2:4

rm(x)

Objects get deleted thanks to the garbage collector (GC) . GC frees up memory by deleting R objects that are no longer used. GC runs automatically whenever R needs more memory to create a new object.

There is no reason to call gc() yourself unless you want to:

ask R to return memory to your operating system so other programs can use it, or
to know how much memory is currently being used (use lobstr::mem_used())

18 / 18

Outline

The distinction between names and values

When R makes a copy, and how to track them

How much memory an object actually occupies

Exceptions to copy-on-modify

The garbage collector

Prerequisites

The lobstr package is used to understand the internal representation of R objects.

library(lobstr)

2 / 18

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Advanced R

Chapter 2: Names and Values

Josh Pohlkamp-Hartt

@JPohlkampHartt

2021-02-06

Outline

Binding basics

Binding basics

Binding basics cont.

Copy-on-modify

Copy-on-modify

Copy-on-modify cont.

Copy-on-modify: Lists

Copy-on-modify: Lists

Copy-on-modify: Lists cont.

Copy-on-modify: Data Frames

Copy-on-modify: Data Frames

Copy-on-modify: Data Frames

Copy-on-modify: Character Vectors

obj_size ≠\neq object.size

Modify-in-place

Unbinding and the garbage collector

Outline

Help

obj_size $\neq$ object.size