Modify-in-place

  • Modifying usually creates a copy except for
    • Objects with a single binding (performance optimization)
    • Environments (special)

Objects with a single binding

  • Hard to know if copy will occur
  • If you have 2+ bindings and remove them, R can’t follow how many are removed (so will always think there are more than one)
  • May make a copy even if there’s only one binding left
  • Using a function makes a reference to it unless it’s a function based on C
  • Best to use tracemem() to check rather than guess.

Example - lists vs. data frames in for loop

Setup

Create the data to modify

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

Data frame - Copied every time!

cat(tracemem(x), "\n")
#> <0x560e51c1e378>
for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x560e51c1e378 -> 0x560e515d4958]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e515d4958 -> 0x560e514763c8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e514763c8 -> 0x560e51476438]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e51476438 -> 0x560e514764a8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e514764a8 -> 0x560e51476518]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e51476518 -> 0x560e51476588]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e51476588 -> 0x560e514765f8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e514765f8 -> 0x560e51476668]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e51476668 -> 0x560e514766d8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous> 
#> tracemem[0x560e514766d8 -> 0x560e51476748]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
untracemem(x)

List (uses internal C code) - Copied once!

y <- as.list(x)

cat(tracemem(y), "\n")
#> <0x560e518aa458>
for (i in seq_along(medians)) {
  y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x560e518aa458 -> 0x560e519acb48]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
untracemem(y)

Benchmark this (Exercise #2)

First wrap in a function

med <- function(d, medians) {
  for (i in seq_along(medians)) {
    d[[i]] <- d[[i]] - medians[[i]]
  }
}

Try with 5 columns

x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)

bench::mark(
  "data.frame" = med(x, medians),
  "list" = med(y, medians)
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 data.frame   86.9µs   90.9µs    10473.     410KB     162.
#> 2 list         34.7µs   35.9µs    25451.     391KB     256.

Try with 20 columns

x <- data.frame(matrix(runif(5 * 1e4), ncol = 20))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)

bench::mark(
  "data.frame" = med(x, medians),
  "list" = med(y, medians)
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 data.frame  254.3µs  270.1µs     3675.     400KB     40.8
#> 2 list         38.4µs   40.4µs    23985.     392KB    246.

WOW!

Environmments

  • Always modified in place (reference semantics)
  • Interesting because if you modify the environment, all existing bindings have the same reference
  • If two names point to the same environment, and you update one, you update both!
e1 <- rlang::env(a = 1, b = 2, c = 3)
e2 <- e1
e1$c <- 4
e2$c
#> [1] 4
  • This means that environments can contain themselves (!)

Exercises

1. Why isn’t this circular?
x <- list()
x[[1]] <- x

Because the binding to the list() object moves from x in the first line to x[[1]] in the second.

2. (see “Objects with a single binding”)
3. What happens if you attempt to use tracemem() on an environment?
e1 <- rlang::env(a = 1, b = 2, c = 3)
tracemem(e1)
#> Error in tracemem(e1): 'tracemem' is not useful for promise and environment objects

Because environments always modified in place, there’s no point in tracing them