Modify-in-place
- Modifying usually creates a copy except for
- Objects with a single binding (performance optimization)
- Environments (special)
Objects with a single binding
- Hard to know if copy will occur
- If you have 2+ bindings and remove them, R can’t follow how many are removed (so will always think there are more than one)
- May make a copy even if there’s only one binding left
- Using a function makes a reference to it unless it’s a function based on C
- Best to use
tracemem()
to check rather than guess.
Example - lists vs. data frames in for loop
Setup
Create the data to modify
Data frame - Copied every time!
cat(tracemem(x), "\n")
#> <0x557c2b7c2448>
for (i in seq_along(medians)) {
x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x557c2b7c2448 -> 0x557c2b52d988]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b52d988 -> 0x557c2b50bef8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b50bef8 -> 0x557c2b4a7f38]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a7f38 -> 0x557c2b4a8558]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a8558 -> 0x557c2b4a64e8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a64e8 -> 0x557c2b4a68d8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a68d8 -> 0x557c2b4a57b8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a57b8 -> 0x557c2b4a5ba8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a5ba8 -> 0x557c2b4a3048]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x557c2b4a3048 -> 0x557c2b4a3518]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
untracemem(x)
List (uses internal C code) - Copied once!
y <- as.list(x)
cat(tracemem(y), "\n")
#> <0x557c2e2d89e8>
for (i in seq_along(medians)) {
y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x557c2e2d89e8 -> 0x557c2e5c9648]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
untracemem(y)
Benchmark this (Exercise #2)
First wrap in a function
Try with 5 columns
x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)
bench::mark(
"data.frame" = med(x, medians),
"list" = med(y, medians)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 data.frame 87.1µs 90.6µs 10283. 410KB 159.
#> 2 list 34.5µs 35.6µs 26140. 391KB 261.
Try with 20 columns
x <- data.frame(matrix(runif(5 * 1e4), ncol = 20))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)
bench::mark(
"data.frame" = med(x, medians),
"list" = med(y, medians)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 data.frame 252.6µs 264.6µs 3747. 400KB 42.3
#> 2 list 38.4µs 39.8µs 24417. 392KB 249.
WOW!
Environmments
- Always modified in place (reference semantics)
- Interesting because if you modify the environment, all existing bindings have the same reference
- If two names point to the same environment, and you update one, you update both!
- This means that environments can contain themselves (!)