Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R

Chapter 2: Names and Values

Josh Pohlkamp-Hartt

@JPohlkampHartt

2021-02-06

1 / 18

Outline

  • The distinction between names and values

  • When R makes a copy, and how to track them

  • How much memory an object actually occupies

  • Exceptions to copy-on-modify

  • The garbage collector

Prerequisites

The lobstr package is used to understand the internal representation of R objects.

library(lobstr)
2 / 18

Binding basics

Consider this simple example, what is happening?

x <- c(1, 2, 3)

Are we creating x and it has values 1,2,3?

3 / 18

Binding basics

Consider this simple example, what is happening?

x <- c(1, 2, 3)

Are we creating x and it has values 1,2,3?

Not really, it’s more accurate to say that this code is:

  • Creating an object, a vector of values, c(1, 2, 3)
  • binding that object to a name, x

4 / 18

Binding basics cont.

In fact, you can think of a name as a reference to a value.

For example, in the following code, we don't copy of the vector c(1, 2, 3), we get another binding to the existing object:

y <- x

We can use lobstr::obj_addr() to see these object's identifiers:

obj_addr(x)
## [1] "0x7fa0c0336e28"
obj_addr(y)
## [1] "0x7fa0c0336e28"

5 / 18

Copy-on-modify

What happens to x when we modify y ?

y[[3]] <- 4
6 / 18

Copy-on-modify

What happens to x when we modify y ?

y[[3]] <- 4
x
## [1] 1 2 3

Changing y did not modify x.

This is due to a behavior called copy-on-modify.

obj_addr(x)
## [1] "0x7fa0c0336e28"
obj_addr(y)
## [1] "0x7fa0c41eca08"

7 / 18

Copy-on-modify cont.

We can use tracemem() to track when an object gets copied. It does so by printing the address of the object every time it is copied.

x <- c(1, 2, 3)
cat(tracemem(x), "\n")
## <0x7fa0be68bcd8>
y <- x
y[[3]] <- 4L
## tracemem[0x7fa0be68bcd8 -> 0x7fa0bd53a8e8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous>

If we modify y again, we will not make another copy. That’s because the new object now only has a single name bound to it, y, so R applies modify-in-place optimisation.

y[[3]] <- 5L
untracemem(x)

untracemem() is the opposite of tracemem(); it turns tracing off.

8 / 18

Copy-on-modify: Lists

It’s not just names that point to values; elements of lists do too.

Consider this list, which is appears similar to the vector above.

l1 <- list(1, 2, 3)

This list is more complex because instead of storing the values itself, it stores references to them.

9 / 18

Copy-on-modify: Lists

It’s not just names that point to values; elements of lists do too.

Consider this list, which is appears similar to the vector above.

l1 <- list(1, 2, 3)

This list is more complex because instead of storing the values itself, it stores references to them.

This is particularly important when we modify a list:

l2 <- l1

10 / 18

Copy-on-modify: Lists cont.

When modifications are made, the list object and its bindings are copied, but the values pointed to by the bindings are not.

l2[[3]] <- 4

To see values' addresses that are shared across lists, use lobstr::ref().

ref(l1, l2)
## █ [1:0x7fa0bd6a38a8] <list>
## ├─[2:0x7fa0bd669fc0] <dbl>
## ├─[3:0x7fa0bd669f88] <dbl>
## └─[4:0x7fa0bd669f50] <dbl>
##
## █ [5:0x7fa0bd7f4ac8] <list>
## ├─[2:0x7fa0bd669fc0]
## ├─[3:0x7fa0bd669f88]
## └─[6:0x7fa0bd785bd0] <dbl>
11 / 18

Copy-on-modify: Data Frames

Data frames are lists of vectors, so copy-on-modify has important consequences.

d1 <- data.frame(x = c(1, 5, 6),
y = c(2, 4, 3))

12 / 18

Copy-on-modify: Data Frames

Data frames are lists of vectors, so copy-on-modify has important consequences.

If we modify a column, only that column needs to be modified; the others will still point to their original references:

d2 <- d1
d2[, 2] <- d2[, 2] * 2

13 / 18

Copy-on-modify: Data Frames

Data frames are lists of vectors, so copy-on-modify has important consequences.

If we modify a row, every column is modified, which means every column must be copied:

d3 <- d1
d3[1, ] <- d3[1, ] * 3

14 / 18

Copy-on-modify: Character Vectors

R actually uses a global string pool where each element of a character vector is a pointer to a unique string in the pool:

x <- c("a", "a", "abc", "d")
ref(x, character = T)
## █ [1:0x7fa0bd6a3ad8] <chr>
## ├─[2:0x7fa0bcac6130] <string: "a">
## ├─[2:0x7fa0bcac6130]
## ├─[3:0x7fa0bd459d58] <string: "abc">
## └─[4:0x7fa0bf1ab2e8] <string: "d">

This has implications for how much memory a character vector uses. To find out, we use lobstr::obj_size().

obj_size(x)
## 248 B
obj_size("d")
## 112 B
x<-c(x,"d")
obj_size(x)
## 264 B
15 / 18

obj_size object.size

When checking object size, lobstr::obj_size() will provide a more accurate result.

This is commented on in the documentation for utils::object.size()

"...it should be reasonably accurate for atomic vectors, but does not detect if elements of a list are shared."

--- ?object.size
y <- rep(list(runif(1e4)), 100)
obj_size(y)
## 80,896 B
object.size(y)
## 8005648 bytes
16 / 18

Modify-in-place

Modyfing an R object usually creates a copy. The exceptions are:

  • objects with a single binding (as shown earlier)
  • Environments, a special type of object, are always modified in place

Implication: we can create functions that “remember” their previous state.

e1 <- rlang::env(a = 1, b = 2, c = 3)
e2 <- e1

e1$c <- 4

17 / 18

Unbinding and the garbage collector

x <- 1:3
x <- 2:4
rm(x)

Objects get deleted thanks to the garbage collector (GC) . GC frees up memory by deleting R objects that are no longer used. GC runs automatically whenever R needs more memory to create a new object.

There is no reason to call gc() yourself unless you want to:

  • ask R to return memory to your operating system so other programs can use it, or
  • to know how much memory is currently being used (use lobstr::mem_used())
18 / 18

Outline

  • The distinction between names and values

  • When R makes a copy, and how to track them

  • How much memory an object actually occupies

  • Exceptions to copy-on-modify

  • The garbage collector

Prerequisites

The lobstr package is used to understand the internal representation of R objects.

library(lobstr)
2 / 18
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow