Functionals

Learning objectives

  • Describe the “functional style” of programming in R.
  • Replace for loops with functionals.
  • Use the purrr::map() family of functions to apply a function to each element of a list or vector.
  • Combine multiple functionals to solve complex problems.
  • Use purrr::reduce() to combine elements of a vector into a single result.
  • Use predicate functionals to work with logical conditions.
  • Recognize and use base R functionals that lack purrr equivalents.

R is a functional language at heart

  • R lends itself to a style of problem solving centered on functions.
  • This “functional style” is a good fit for data analysis problems.
  • Functional techniques can produce efficient and elegant solutions.

Functional programming languages have first-class functions

A key feature of functional languages is their use of first-class functions.

In R, this means you can:

  • Assign functions to variables.
  • Store them in lists.
  • Pass them as arguments to other functions.
  • Create them inside functions.
  • Return them as the result of a function.

Pure functions depend only on their inputs

Many functional languages require functions to be pure.

  • A pure function’s output only depends on its inputs.
    • runif(), read.csv(), and Sys.time() are not pure.
  • A pure function has no side-effects (e.g., changing global variables, writing to disk).
    • print(), write.csv(), and <- are not pure.

R is not a strictly functional language because it doesn’t require pure functions.

The “functional style” decomposes a big problem into smaller pieces

  • Solve each piece with a function or combination of functions.
  • Strive to create isolated functions that operate independently.
  • Complexity is handled by composing functions in various ways.

Key functional techniques

  • Chapter 9: Functionals: Functions that take a function as an argument.
  • Chapter 10: Function factories: Functions that create functions.
  • Chapter 11: Function operators: Functions that take functions as input and produce functions as output.

A functional takes a function as input and returns a vector as output

randomise <- function(f) f(runif(1e3))
randomise(mean)
#> [1] 0.4903279
randomise(sum)
#> [1] 500.9905

Common examples:

  • lapply(), apply(), and tapply() in base R
  • purrr::map()
  • Mathematical functionals like integrate() or optim()

Functionals are better than for loops

To become significantly more reliable, code must become more transparent. In particular, nested conditions and loops must be viewed with great suspicion. Complicated control flows confuse programmers. Messy code often hides bugs.

— Bjarne Stroustrup

  • for loops are too flexible. You’re iterating, but why?
  • Each functional is tailored for a specific task. Conveys intent.

purrr::map() applies a function to each element of a vector

map(1:3, f) == list(f(1), f(2), f(3))

library(purrr)
triple <- function(x) x * 3
map(1:3, triple)
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 6
#> 
#> [[3]]
#> [1] 9

Use map_<type>() to return an atomic vector

  • map() returns a list.
  • map_lgl() returns a logical vector.
  • map_int() returns an integer vector.
  • map_dbl() returns a double vector.
  • map_chr() returns a character vector.

Use map_<type>() to return an atomic vector (cont.)

# map_chr() always returns a character vector
map_chr(mtcars, typeof)
#>      mpg      cyl     disp       hp     drat       wt     qsec       vs 
#> "double" "double" "double" "double" "double" "double" "double" "double" 
#>       am     gear     carb 
#> "double" "double" "double"
# map_dbl() always returns a double vector
map_dbl(mtcars, mean)
#>        mpg        cyl       disp         hp       drat         wt       qsec 
#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
#>         vs         am       gear       carb 
#>   0.437500   0.406250   3.687500   2.812500

Use anonymous functions for concise operations

map_dbl(mtcars, function(x) length(unique(x)))
#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

purrr provides ~ shortcut:

map_dbl(mtcars, ~ length(unique(.x)))
#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

R 4.1.0 provides \() shortcut (\ == function):

map_dbl(mtcars, \(x) length(unique(x)))
#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

Pass additional arguments to map() with ... or via anonymous function

x <- list(1:5, c(1:10, NA))
map_dbl(x, mean, na.rm = TRUE)
#> [1] 3.0 5.5
map_dbl(x, ~ mean(.x, na.rm = TRUE))
#> [1] 3.0 5.5
map_dbl(x, \(x) mean(x, na.rm = TRUE))
#> [1] 3.0 5.5

purrr style: pipe simple steps together

  • Pipe (|> / %>%) + purrr ➡️ readable code
  • Each line is a single, understandable step.
mtcars |>
  split(mtcars$cyl) |>
  map(~ lm(mpg ~ wt, data = .x)) |>
  map(coef) |>
  map_dbl(2)
#>         4         6         8 
#> -5.647025 -2.780106 -2.192438

purrr::modify() returns the same type as the input

map() always returns a list:

df <- data.frame(x = 1:3, y = 6:4)
map(df, ~ .x * 2)
#> $x
#> [1] 2 4 6
#> 
#> $y
#> [1] 12 10  8

modify() returns the same type as the input:

modify(df, ~ .x * 2)
#>   x  y
#> 1 2 12
#> 2 4 10
#> 3 6  8

purrr::map2() iterates over two vectors in parallel

Find a weighted mean from 2 lists: observations (xs) & weights (ws).

xs <- map(1:3, ~ runif(5))
ws <- map(1:3, ~ rpois(5, 5) + 1)

map() passes the whole ws list to each call:

map_dbl(xs, weighted.mean, w = ws)
#> Error in `map_dbl()`:
#> ℹ In index: 1.
#> Caused by error in `weighted.mean.default()`:
#> ! 'x' and 'w' must have the same length

map2() iterates over xs and ws in parallel:

map2_dbl(xs, ws, weighted.mean)
#> [1] 0.5589596 0.4285742 0.6900269

purrr::walk() is for functions called for their side effects

  • E.g., cat(), write.csv(), ggsave().
  • walk() returns its input invisibly.
cyls <- split(mtcars, mtcars$cyl)
paths <- file.path(tempdir(), paste0("cyl-", names(cyls), ".csv"))
walk2(cyls, paths, write.csv)
dir(tempdir(), pattern = "cyl-")
#> [1] "cyl-4.csv" "cyl-6.csv" "cyl-8.csv"

purrr::imap() iterates over values and indices

Named input == map2(.x, names(.x), .f)

imap_chr(iris[, 1:2], ~ paste0("The first value of '", .y, "' is ", .x[[1]]))
#>                               Sepal.Length 
#> "The first value of 'Sepal.Length' is 5.1" 
#>                                Sepal.Width 
#>  "The first value of 'Sepal.Width' is 3.5"

Unnamed input == map2(.x, seq_along(.x), .f)

x <- map(1:2, ~ sample(100, 5))
imap_chr(x, ~ paste0("The max of element ", .y, " is ", max(.x)))
#> [1] "The max of element 1 is 70" "The max of element 2 is 82"

purrr::pmap() iterates over multiple arguments in a list

pmap() applies function to list of arguments

  • map2(x, y, f) is equivalent to pmap(list(x, y), f).
  • A data frame is a list, works great with pmap().
params <- tibble::tribble(
  ~n, ~min, ~max,
  1L, 0, 1,
  2L, 10, 100,
)

pmap(params, runif)
#> [[1]]
#> [1] 0.324684
#> 
#> [[2]]
#> [1] 86.14710 73.00569

purrr::reduce() combines vector elements with a binary function

  • “Reduces” a vector to 1 value by repeatedly applying 2-arg function
  • reduce(1:4, f) is equivalent to f(f(f(1, 2), 3), 4).

Example: Find the numbers that appear in every vector in a list.

set.seed(123)
lst <- map(1:4, ~ sample(1:10, 15, replace = TRUE))
reduce(lst, intersect)
#> [1] 10  5  9

purrr::accumulate() shows intermediate results

  • Like reduce(), but returns all the intermediate results.
  • Great way to understand how reduce() works.
accumulate(lst, intersect)
#> [[1]]
#>  [1]  3  3 10  2  6  5  4  6  9 10  5  3  9  9  9
#> 
#> [[2]]
#> [1]  3 10  5  4  9
#> 
#> [[3]]
#> [1] 10  5  9
#> 
#> [[4]]
#> [1] 10  5  9

purrr::accumulate() is useful for cumulative calculations

accumulate(c(4, 3, 10), `+`)
#> [1]  4  7 17

Predicate functionals apply a predicate to each element

Predicate: function that returns single TRUE or FALSE.

  • some() / every() / none(): True for any / all / no elements?
  • detect() / detect_index(): Find value / location of 1st match.
  • keep() / discard(): Keep / drop all matching elements.
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
str(keep(df, is.numeric))
#> 'data.frame':    3 obs. of  1 variable:
#>  $ x: int  1 2 3
str(discard(df, is.numeric))
#> 'data.frame':    3 obs. of  1 variable:
#>  $ y: chr  "a" "b" "c"

map_if() and modify_if() transform elements where a predicate is true

E.g, calculate mean of only numeric columns in a data frame.

df <- data.frame(
  num1 = c(0, 10, 20),
  num2 = c(5, 6, 7),
  chr1 = c("a", "b", "c")
)

str(map_if(df, is.numeric, mean))
#> List of 3
#>  $ num1: num 10
#>  $ num2: num 6
#>  $ chr1: chr [1:3] "a" "b" "c"
str(modify_if(df, is.numeric, mean))
#> 'data.frame':    3 obs. of  3 variables:
#>  $ num1: num  10 10 10
#>  $ num2: num  6 6 6
#>  $ chr1: chr  "a" "b" "c"

base::apply() summarizes matrices and arrays

  • Collapses 1 or more matrix/array dimensions by applying a summary function.

apply(X, MARGIN, FUN): MARGIN? 1 for rows, 2 for columns.

a2d <- matrix(1:20, nrow = 5)
# Row means
apply(a2d, 1, mean)
#> [1]  8.5  9.5 10.5 11.5 12.5
# Column means
apply(a2d, 2, mean)
#> [1]  3  8 13 18

Warning: apply() will coerce df to a matrix!

Base R has mathematical functionals

Base R includes several mathematical functionals.

  • integrate(): Find the area under a curve.
  • uniroot(): Find where a function equals zero.
  • optimise(): Find the minimum or maximum value of a function.
integrate(sin, 0, pi)
#> 2 with absolute error < 2.2e-14