“Before you can make your code faster, you first need to figure out what’s making it slow.”
SLOW DOWN TO LEARN HOW TO CODE FASTER | credits: packtpub.com
Profiling the code is the process of gathering information about the resource requirements of the individual calls in your code.
Profilers sample the code performance by stopping the execution of code every few milliseconds, gauging which process is being engaged, and recording the resource utilization.
Example:
Profile the execution of f():
- `profvis::pause()` is used instead of `Sys.sleep()`
as the latter makes R use no resources
- profile `f()`, with `utils::Rprof()`
#> NULL
#> sample.interval=100000
#> "pause" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main"
#> "pause" "g" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main"
#> "pause" "h" "g" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main"
#> "pause" "h" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main"
Makes easier to build up a mental model of what you need to change:
- `profvis::profvis(expr)`
- `utils::summaryRprof()`
Profiling a loop that modifies an existing variable: most of the time is spent in garbage collection (i.e. releasing and reclaiming memory that is no longer needed).
You can figure out what is the source of the problem by looking at the memory column. In this case, copy-on-modify acts in each iteration of the loop creating another copy of x.
Where is the time being spent in this function?
?rm()
In software development, microbenchmarking is understood as measuring time or other aspects of performance of a very small piece of code. You expect the code to run microseconds, but you know that in your larger project, you will run this thousands or millions of times. It is particularly useful when you are considering different implementations of that very frequently repeated step, and want to maximize performance.
Credits: Google search-engine
#> Unit: nanoseconds
#> expr min lq mean median uq max neval
#> sqrt(x) 300 500 1116 700 1600 5600 100
#> x^0.5 2900 3300 4082 3700 4550 10700 100
#> exp(0.5 * log(x)) 8500 9400 10477 10100 11250 19900 100
#> uniroot_sqrt(x) 2578200 3201350 3697142 3344050 3506400 26347800 100
The underlying data.frame()-ish object is only expr and time. What you see above is the summary() method applied to it. (Output of plain mb1 differs between rendering systems; interactively it is passed through the print() method.)
Much greater detail is provided by the {bench} package: bench::mark().
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sqrt(x) 300ns 400ns 1772924. 848B 0
#> 2 x^0.5 2.9µs 3.3µs 300156. 848B 0
#> 3 exp(0.5 * log(x)) 7.9µs 9.2µs 106807. 848B 0
#> 4 uniroot_sqrt(x) 2.51ms 3.35ms 289. 848B 30.8
Again, what you see above is summary(lb). Timing for each run is inside the S3 class components. (Output of plain lb differs between rendering systems; interactively it is passed through the print() method.)
Results are typically heavily skewed: some calculations may have hit “unfortunate” moments with the system was busy with something else. Pay more attention to medians rather than means. The uniroot-based calculation was so slow that bench::mark() decided to stop after a few hundred runs rather than the default 10000 runs, so as to fit the total simulation budget of 0.5 sec.
Things that can/will affect performance, some are hard to measure:
Scaling
I/O bottlenecks
Laziness and prudence: when things get evaluated and put into memory
Understand how parallel processing (future, mirai) passes objects between instances, depend on each others’ completions, and wait for one another otherwise