Measuring performance

Learning objectives:

  • Understand how to figure out where R is spending time executing your code
  • Learn about the tools for benchmarking your code
  • Implement code profiling

Introduction

“Before you can make your code faster, you first need to figure out what’s making it slow.”

SLOW DOWN TO LEARN HOW TO CODE FASTER | credits: packtpub.com

  • profile your code: measure the run-time of each line of code using realistic inputs
  • experiment with alternatives to find faster code
  • microbenchmark to measure the difference in performance.

Packages introduced

library(profvis)
library(bench)
library(microbenchmark)

Profiling

Profiling the code is the process of gathering information about the resource requirements of the individual calls in your code.

Profilers sample the code performance by stopping the execution of code every few milliseconds, gauging which process is being engaged, and recording the resource utilization.

Example:

f <- function() {
  pause(0.1)
  g()
  h()
}
g <- function() {
  pause(0.1)
  h()
}
h <- function() {
  pause(0.1)
}

Profiling

Profile the execution of f():

- `profvis::pause()` is used instead of `Sys.sleep()` 
  as the latter makes R use no resources
- profile `f()`, with `utils::Rprof()`
tmp <- tempfile()
Rprof(tmp, interval = 0.1)
f()
#> NULL
Rprof(NULL)
writeLines(readLines(tmp))
#> sample.interval=100000
#> "pause" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main" 
#> "pause" "g" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main" 
#> "pause" "h" "g" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main" 
#> "pause" "h" "f" "eval" "eval" "withVisible" "withCallingHandlers" "eval" "eval" "with_handlers" "doWithOneRestart" "withOneRestart" "withRestartList" "doWithOneRestart" "withOneRestart" "withRestartList" "withRestarts" "evaluate::evaluate" "evaluate" "in_dir" "in_input_dir" "eng_r" "block_exec" "call_block" "process_group" "withCallingHandlers" "with_options" "xfun:::handle_error" "process_file" "knitr::knit" "rmarkdown::render" "execute" ".main"

Visualising profiles

Makes easier to build up a mental model of what you need to change:

- `profvis::profvis(expr)`
- `utils::summaryRprof()`
source("scripts/profiling-example.R")
profvis(f())

Mini-case study: poor practice of copy-on-modify

Profiling a loop that modifies an existing variable: most of the time is spent in garbage collection (i.e. releasing and reclaiming memory that is no longer needed).

profvis::profvis({
  x <- integer()
for (i in 1:1e4) {
  x <- c(x, i)
}
})

You can figure out what is the source of the problem by looking at the memory column. In this case, copy-on-modify acts in each iteration of the loop creating another copy of x.

  • All the time is being spent in the concatenation operation.
  • Nearly the same amount of memory is being asked for and is being released.

Limitations

  • Profiling does not extend to C code.
    • C code can be profiled with its own tools in C IDEs.
  • Anonymous functions do not show up properly.
  • Lazy evaluation of function arguments makes things complicated.
    • Are things evaluated inside another function, and whose line is it anyway?

Exercise

Where is the time being spent in this function?

profvis::profvis({
  f <- function(n = 1e5) {
  x <- rep(1, n)
  rm(x)
}
}, torture = 10)
?rm()

Solution

Microbenchmarking

In software development, microbenchmarking is understood as measuring time or other aspects of performance of a very small piece of code. You expect the code to run microseconds, but you know that in your larger project, you will run this thousands or millions of times. It is particularly useful when you are considering different implementations of that very frequently repeated step, and want to maximize performance.

Credits: Google search-engine

Microbenchmarking

x <- runif(100)
uniroot_sqrt <- function(x) {
  solution <- x
  for(i in seq_along(x)) {
    solution[i] <- uniroot( 
      f = function(z, a) { z*z - a }, 
      interval = c(0,1), 
      tol = 5 * .Machine$double.eps,
      a = x[i]) |> 
      purrr::pluck('root')
  }
  solution
}

Microbenchmarking

mb1 <- microbenchmark(
  sqrt(x),
  x ^ 0.5,
  exp( 0.5 * log(x)),
  uniroot_sqrt(x)
)
mb1
#> Unit: nanoseconds
#>               expr     min      lq    mean  median      uq      max neval
#>            sqrt(x)     300     500    1116     700    1600     5600   100
#>              x^0.5    2900    3300    4082    3700    4550    10700   100
#>  exp(0.5 * log(x))    8500    9400   10477   10100   11250    19900   100
#>    uniroot_sqrt(x) 2578200 3201350 3697142 3344050 3506400 26347800   100

The underlying data.frame()-ish object is only expr and time. What you see above is the summary() method applied to it. (Output of plain mb1 differs between rendering systems; interactively it is passed through the print() method.)

Microbenchmarking

Much greater detail is provided by the {bench} package: bench::mark().

lb <- bench::mark(
  sqrt(x),
  x ^ 0.5,
  exp( 0.5 * log(x)),
  uniroot_sqrt(x)
)
lb
#> # A tibble: 4 × 6
#>   expression             min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sqrt(x)              300ns    400ns  1772924.      848B      0  
#> 2 x^0.5                2.9µs    3.3µs   300156.      848B      0  
#> 3 exp(0.5 * log(x))    7.9µs    9.2µs   106807.      848B      0  
#> 4 uniroot_sqrt(x)     2.51ms   3.35ms      289.      848B     30.8

Again, what you see above is summary(lb). Timing for each run is inside the S3 class components. (Output of plain lb differs between rendering systems; interactively it is passed through the print() method.)

Microbenchmarking distributions

plot(lb)

Results are typically heavily skewed: some calculations may have hit “unfortunate” moments with the system was busy with something else. Pay more attention to medians rather than means. The uniroot-based calculation was so slow that bench::mark() decided to stop after a few hundred runs rather than the default 10000 runs, so as to fit the total simulation budget of 0.5 sec.

Other considerations

Things that can/will affect performance, some are hard to measure:

  • Scaling

    • bubble sort: \(O(n^2)\), actual sort: \(O(n \log n)\)
    • matrix inversion: \(O(n^3)\) in linear algebra classes, \(O(n^{-2.3})\) with the best algorithms
  • I/O bottlenecks

    • CPU cache \(\approx\) 1000x faster than RAM which is \(\approx\) 1000 faster than local HDD which is \(\approx\) 10–100 faster than network/cloud
  • Laziness and prudence: when things get evaluated and put into memory

  • Understand how parallel processing (future, mirai) passes objects between instances, depend on each others’ completions, and wait for one another otherwise

Resources