Modify-in-place
- Modifying usually creates a copy except for
- Objects with a single binding (performance optimization)
- Environments (special)
Objects with a single binding
- Hard to know if copy will occur
- If you have 2+ bindings and remove them, R can’t follow how many are removed (so will always think there are more than one)
- May make a copy even if there’s only one binding left
- Using a function makes a reference to it unless it’s a function based on C
- Best to use
tracemem()
to check rather than guess.
Example - lists vs. data frames in for loop
Setup
Create the data to modify
Data frame - Copied every time!
cat(tracemem(x), "\n")
#> <0x560e51c1e378>
for (i in seq_along(medians)) {
x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x560e51c1e378 -> 0x560e515d4958]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e515d4958 -> 0x560e514763c8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e514763c8 -> 0x560e51476438]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e51476438 -> 0x560e514764a8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e514764a8 -> 0x560e51476518]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e51476518 -> 0x560e51476588]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e51476588 -> 0x560e514765f8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e514765f8 -> 0x560e51476668]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e51476668 -> 0x560e514766d8]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
#> tracemem[0x560e514766d8 -> 0x560e51476748]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
untracemem(x)
List (uses internal C code) - Copied once!
y <- as.list(x)
cat(tracemem(y), "\n")
#> <0x560e518aa458>
for (i in seq_along(medians)) {
y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x560e518aa458 -> 0x560e519acb48]: eval eval withVisible withCallingHandlers eval eval with_handlers doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList doWithOneRestart withOneRestart withRestartList withRestarts <Anonymous> evaluate in_dir in_input_dir eng_r block_exec call_block process_group withCallingHandlers <Anonymous> process_file <Anonymous> <Anonymous> render_cur_session <Anonymous>
untracemem(y)
Benchmark this (Exercise #2)
First wrap in a function
Try with 5 columns
x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)
bench::mark(
"data.frame" = med(x, medians),
"list" = med(y, medians)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 data.frame 86.9µs 90.9µs 10473. 410KB 162.
#> 2 list 34.7µs 35.9µs 25451. 391KB 256.
Try with 20 columns
x <- data.frame(matrix(runif(5 * 1e4), ncol = 20))
medians <- vapply(x, median, numeric(1))
y <- as.list(x)
bench::mark(
"data.frame" = med(x, medians),
"list" = med(y, medians)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 data.frame 254.3µs 270.1µs 3675. 400KB 40.8
#> 2 list 38.4µs 40.4µs 23985. 392KB 246.
WOW!
Environmments
- Always modified in place (reference semantics)
- Interesting because if you modify the environment, all existing bindings have the same reference
- If two names point to the same environment, and you update one, you update both!
- This means that environments can contain themselves (!)