+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R: Chapters 23 & 24

All about performance

Jon Leslie (@jlesliedata)

Last Updated: 2020-09-02

1 / 30

How to go fast

  • Find out what's making it slow (Chapter 23)
  • Experiment with faster alternatives (Chapter 24)
2 / 30

Part 1: What's making it slow?

Profiling

  • Use a profiler: profvis
  • A sampling/statistical profiler
  • Periodically stops execution and looks at the call stack
3 / 30

Memory profiling

x <- integer()
for(i in 1:1e4) {
x <- c(x, i)
}

memory


This shows that large amounts of memory are being allocated (bar on the right) and freed-up (bar on the left)
5 / 30

Limitations

  • Profiling does not extend to C code.
  • Using anonymous functions can make profiling difficult. Give them names.
  • Lazy evaluation can make things complicated:
i <- function() {
pause(0.1)
10
}
j <- function(x) {
x + 10
}
j(i())

"...profiling would make it seem like i() was called by j() because the argument isn't evaluated until it's needed by j()."

lazy-eval
6 / 30

Profiling Shiny apps

library(shiny)
profvis({
runExample(example = "06_tabsets", display.mode = "normal")
})
7 / 30

8 / 30

Part 1: What's making it slow?

Microbenchmarking

  • For very small bits of code
  • Beware of generalising to real code: higher-order effects may mask the small bits of code

    "a deep understanding of subatomic physics is not very helpful when baking"

  • We will use the bench package
9 / 30
x <- runif(100)
(lb <- bench::mark(
sqrt(x),
x ^ 0.5
))
## # A tibble: 2 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 sqrt(x) 334ns 700ns 232862. 848B 0
## 2 x^0.5 2.1µs 2.25µs 269744. 848B 0

median is probably the best metric to use

10 / 30
plot(lb)

11 / 30

Part 2: Making it go fast!

"Four" techniques:

  1. Organise your code
  2. Look for existing solutions
  3. The importance of being lazy
  4. Vectorise
  5. Avoid the perils of copying data
12 / 30

1. Organise your code

Write a function for each approach:

mean1 <- function(x) mean(x)
mean2 <- function(x) sum(x)/length(x)

Generate representative test cases:

x <- runif(1e5)

Precisely compare the variants (and include unit tests (not included))

bench::mark(
mean1(x),
mean2(x)
)[c("expression", "min", "median", "itr/sec", "n_gc")]
## # A tibble: 2 x 4
## expression min median `itr/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl>
## 1 mean1(x) 153.9µs 155.6µs 6133.
## 2 mean2(x) 76.2µs 76.9µs 12133.
13 / 30

2. Check for existing solutions

14 / 30

3. Do as little as possible

  • Use a function tailered to a more specific type of input or to a more specific problem:
    • rowSums(), colSums(), rowMeans(), and colMeans() are faster than equivalent invocations that use apply() because they are vectorised
    • vapply() is faster than sapply() because it pre-specifies the output type
    • any(x == 10) is much faster than 10 %in% x because testing equality is simpler than testing set inclusion.
  • Avoid situations where input data has to be coerced into a different type.
    • Example: giving a data frame to a function that requires a matrix, like apply()
  • Some other tips:
    • read.csv(): specify known column types or use readr::read_csv() or data.table::fread()
    • factor(): specify known levels
    • cut(): use labels = FALSE or findInterval()
    • unlist(x, use.names = FALSE) is faster than unlist(x)
    • interaction(): use drop = TRUE if you can
15 / 30

Example: avoiding method dispatch

x <- runif(1e2)
bench::mark(
mean(x),
mean.default(x)
)[c("expression", "min", "median", "itr/sec", "n_gc")]
## # A tibble: 2 x 4
## expression min median `itr/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl>
## 1 mean(x) 2.39µs 3.5µs 251432.
## 2 mean.default(x) 1.2µs 1.39µs 533535.
x <- runif(1e4)
bench::mark(
mean(x),
mean.default(x)
)[c("expression", "min", "median", "itr/sec", "n_gc")]
## # A tibble: 2 x 4
## expression min median `itr/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl>
## 1 mean(x) 17.6µs 18.8µs 50802.
## 2 mean.default(x) 16.3µs 16.6µs 56833.
16 / 30

But beware!

17 / 30

Example 2: avoiding input coercion

as.data.frame() is slow because it coerces each element into a data frame.

You could, instead, store you data in a named list of equal-length vectors:

quickdf <- function(l) {
class(l) <- "data.frame"
attr(l, "row.names") <- .set_row_names(length(l[[1]]))
l
}
l <- lapply(1:26, function(i) runif(1e3))
names(l) <- letters
dplyr::glimpse(l[1:6])
## List of 6
## $ a: num [1:1000] 0.3726 0.9029 0.8664 0.0337 0.8816 ...
## $ b: num [1:1000] 0.946 0.109 0.767 0.237 0.614 ...
## $ c: num [1:1000] 0.659 0.938 0.317 0.414 0.152 ...
## $ d: num [1:1000] 0.559 0.888 0.872 0.917 0.669 ...
## $ e: num [1:1000] 0.933 0.923 0.757 0.407 0.272 ...
## $ f: num [1:1000] 0.452 0.533 0.915 0.198 0.259 ...
18 / 30
bench::mark(
as.data.frame = as.data.frame(l),
quick_df = quickdf(l)
)[c("expression", "min", "median", "itr/sec", "n_gc")]
## # A tibble: 2 x 4
## expression min median `itr/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl>
## 1 as.data.frame 961.29µs 1.1ms 861.
## 2 quick_df 7.07µs 7.78µs 106802.

Caveat:

This approach requires carefully reading through source code!

19 / 30

4. Vectorise

  • Finding the existing R function that is implemented in C and most closely applies to your problem
  • Some commonly used functions:
    • rowSums(), colSums(), rowMeans(), and colMeans()
    • Vectorised subsetting (Chapter 4)
    • Use cut() and findInterval() for converting continuous variables to categorical
    • Be aware of vectorised functions like cumsum() and diff()
    • Use matrix algebra
  • https://www.noamross.net/archives/2014-04-16-vectorization-in-r-why/
20 / 30

5. Avoid copying

  • Often shows up if using c(), append(), cbind(), rbind(), paste()
random_string <- function() {
paste(sample(letters, 50, replace = TRUE), collapse = "")
}
strings10 <- replicate(10, random_string())
strings100 <- replicate(100, random_string())
collapse <- function(xs) {
out <- ""
for (x in xs) {
out <- paste0(out, x)
}
out
}
bench::mark(
loop10 = collapse(strings10),
loop100 = collapse(strings100),
vec10 = paste(strings10, collapse = ""),
vec100 = paste(strings100, collapse = ""),
check = FALSE
)[c("expression", "min", "median", "itr/sec", "n_gc")]
21 / 30
bench::mark(
loop10 = collapse(strings10),
loop100 = collapse(strings100),
vec10 = paste(strings10, collapse = ""),
vec100 = paste(strings100, collapse = ""),
check = FALSE
)[c("expression", "min", "median", "itr/sec", "n_gc")]
## # A tibble: 4 x 4
## expression min median `itr/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl>
## 1 loop10 29.01µs 30.66µs 29660.
## 2 loop100 612.86µs 644.28µs 1488.
## 3 vec10 5.37µs 5.71µs 162555.
## 4 vec100 28.17µs 29.06µs 32390.
22 / 30

Case study: t-test

m <- 1000
n <- 50
X <- matrix(rnorm(m * n, mean = 10, sd = 3), nrow = m)
grp <- rep(1:2, each = n/2)
23 / 30

Case study: t-test (cont'd)

Formula interface:

system.time(
for(i in 1:m) {
t.test(X[i, ] ~ grp)$statistic
}
)
## user system elapsed
## 0.601 0.014 0.640

Provide two vectors

system.time(
for(i in 1:m) {
t.test(X[i, grp == 1], X[i, grp == 2])$statistic
}
)
## user system elapsed
## 0.140 0.002 0.143
24 / 30

Case study: t-test (cont'd)

Add functionality to save the values:

compT <- function(i) {
t.test(X[i, grp == 1], X[i, grp == 2])$statistic
}
system.time(t1 <- purrr::map_dbl(1:m, compT))
## user system elapsed
## 0.140 0.002 0.143
25 / 30

Case study: t-test (cont'd)

Do less work:

my_t <- function(x, grp) {
t_stat <- function(x) {
m <- mean(x)
n <- length(x)
var <- sum((x - m) ^ 2)/(n-1)
list(m = m, n = n, var = var)
}
g1 <- t_stat(x[grp == 1])
g2 <- t_stat(x[grp == 2])
se_total <- sqrt(g1$var / g1$n + g2$var / g2$n)
(g1$m - g2$m) / se_total
}
system.time(t2 <- purrr::map_dbl(1:m, ~ my_t(X[.,], grp)))
## user system elapsed
## 0.038 0.015 0.053
stopifnot(all.equal(t1, t2))
26 / 30

Case study: t-test (cont'd)

Vectorise it:

rowtstat <- function(X, grp) {
t_stat <- function(X) {
m <- rowMeans(X)
n <- ncol(X)
var <- rowSums((X - m) ^ 2)/(n - 1)
list(m = m, n = n, var = var)
}
g1 <- t_stat(X[, grp == 1])
g2 <- t_stat(X[, grp == 2])
se_total <- sqrt(g1$var/g1$n + g2$var/g2$n)
(g1$m - g2$m) / se_total
}
system.time(t3 <- rowtstat(X, grp))
## user system elapsed
## 7.515 0.893 0.014
stopifnot(all.equal(t1, t3))
27 / 30
28 / 30

Resources

29 / 30

How to go fast

  • Find out what's making it slow (Chapter 23)
  • Experiment with faster alternatives (Chapter 24)
2 / 30
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow