Functionals

Learning objectives

Describe the “functional style” of programming in R.
Replace for loops with functionals.
Use the purrr::map() family of functions to apply a function to each element of a list or vector.
Combine multiple functionals to solve complex problems.
Use purrr::reduce() to combine elements of a vector into a single result.
Use predicate functionals to work with logical conditions.
Recognize and use base R functionals that lack purrr equivalents.

R is a functional language at heart

R lends itself to a style of problem solving centered on functions.
This “functional style” is a good fit for data analysis problems.
Functional techniques can produce efficient and elegant solutions.

Functional programming languages have first-class functions

A key feature of functional languages is their use of first-class functions.

In R, this means you can:

Assign functions to variables.
Store them in lists.
Pass them as arguments to other functions.
Create them inside functions.
Return them as the result of a function.

Pure functions depend only on their inputs

Many functional languages require functions to be pure.

A pure function’s output only depends on its inputs.
- runif(), read.csv(), and Sys.time() are not pure.
A pure function has no side-effects (e.g., changing global variables, writing to disk).
- print(), write.csv(), and <- are not pure.

R is not a strictly functional language because it doesn’t require pure functions.

The “functional style” decomposes a big problem into smaller pieces

Solve each piece with a function or combination of functions.
Strive to create isolated functions that operate independently.
Complexity is handled by composing functions in various ways.

Key functional techniques

Chapter 9: Functionals: Functions that take a function as an argument.
Chapter 10: Function factories: Functions that create functions.
Chapter 11: Function operators: Functions that take functions as input and produce functions as output.

A functional takes a function as input and returns a vector as output

randomise <- function(f) f(runif(1e3))
randomise(mean)

#> [1] 0.4903279

randomise(sum)

#> [1] 500.9905

Common examples:

lapply(), apply(), and tapply() in base R
purrr::map()
Mathematical functionals like integrate() or optim()

Functionals are better than for loops

To become significantly more reliable, code must become more transparent. In particular, nested conditions and loops must be viewed with great suspicion. Complicated control flows confuse programmers. Messy code often hides bugs.

— Bjarne Stroustrup

for loops are too flexible. You’re iterating, but why?
Each functional is tailored for a specific task. Conveys intent.

`purrr::map()` applies a function to each element of a vector

map(1:3, f) == list(f(1), f(2), f(3))

library(purrr)
triple <- function(x) x * 3
map(1:3, triple)

#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 6
#> 
#> [[3]]
#> [1] 9

Use `map_<type>()` to return an atomic vector

map() returns a list.
map_lgl() returns a logical vector.
map_int() returns an integer vector.
map_dbl() returns a double vector.
map_chr() returns a character vector.

Use `map_<type>()` to return an atomic vector (cont.)

# map_chr() always returns a character vector
map_chr(mtcars, typeof)

#>      mpg      cyl     disp       hp     drat       wt     qsec       vs 
#> "double" "double" "double" "double" "double" "double" "double" "double" 
#>       am     gear     carb 
#> "double" "double" "double"

# map_dbl() always returns a double vector
map_dbl(mtcars, mean)

#>        mpg        cyl       disp         hp       drat         wt       qsec 
#>  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
#>         vs         am       gear       carb 
#>   0.437500   0.406250   3.687500   2.812500

Use anonymous functions for concise operations

map_dbl(mtcars, function(x) length(unique(x)))

#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

purrr provides ~ shortcut:

map_dbl(mtcars, ~ length(unique(.x)))

#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

R 4.1.0 provides \() shortcut (\ == function):

map_dbl(mtcars, \(x) length(unique(x)))

#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
#>   25    3   27   22   22   29   30    2    2    3    6

Pass additional arguments to `map()` with `...` or via anonymous function

x <- list(1:5, c(1:10, NA))
map_dbl(x, mean, na.rm = TRUE)

#> [1] 3.0 5.5

map_dbl(x, ~ mean(.x, na.rm = TRUE))

#> [1] 3.0 5.5

map_dbl(x, \(x) mean(x, na.rm = TRUE))

#> [1] 3.0 5.5

`purrr` style: pipe simple steps together

Pipe (|> / %>%) + purrr ➡️ readable code
Each line is a single, understandable step.

mtcars |>
  split(mtcars$cyl) |>
  map(~ lm(mpg ~ wt, data = .x)) |>
  map(coef) |>
  map_dbl(2)

#>         4         6         8 
#> -5.647025 -2.780106 -2.192438

`purrr::modify()` returns the same type as the input

map() always returns a list:

df <- data.frame(x = 1:3, y = 6:4)
map(df, ~ .x * 2)

#> $x
#> [1] 2 4 6
#> 
#> $y
#> [1] 12 10  8

modify() returns the same type as the input:

modify(df, ~ .x * 2)

#>   x  y
#> 1 2 12
#> 2 4 10
#> 3 6  8

`purrr::map2()` iterates over two vectors in parallel

Find a weighted mean from 2 lists: observations (xs) & weights (ws).

xs <- map(1:3, ~ runif(5))
ws <- map(1:3, ~ rpois(5, 5) + 1)

map() passes the whole ws list to each call:

map_dbl(xs, weighted.mean, w = ws)

#> Error in `map_dbl()`:
#> ℹ In index: 1.
#> Caused by error in `weighted.mean.default()`:
#> ! 'x' and 'w' must have the same length

map2() iterates over xs and ws in parallel:

map2_dbl(xs, ws, weighted.mean)

#> [1] 0.5589596 0.4285742 0.6900269

`purrr::walk()` is for functions called for their side effects

E.g., cat(), write.csv(), ggsave().
walk() returns its input invisibly.

cyls <- split(mtcars, mtcars$cyl)
paths <- file.path(tempdir(), paste0("cyl-", names(cyls), ".csv"))
walk2(cyls, paths, write.csv)
dir(tempdir(), pattern = "cyl-")
#> [1] "cyl-4.csv" "cyl-6.csv" "cyl-8.csv"

`purrr::imap()` iterates over values and indices

Named input == map2(.x, names(.x), .f)

imap_chr(iris[, 1:2], ~ paste0("The first value of '", .y, "' is ", .x[[1]]))

#>                               Sepal.Length 
#> "The first value of 'Sepal.Length' is 5.1" 
#>                                Sepal.Width 
#>  "The first value of 'Sepal.Width' is 3.5"

Unnamed input == map2(.x, seq_along(.x), .f)

x <- map(1:2, ~ sample(100, 5))
imap_chr(x, ~ paste0("The max of element ", .y, " is ", max(.x)))

#> [1] "The max of element 1 is 70" "The max of element 2 is 82"

`purrr::pmap()` iterates over multiple arguments in a list

pmap() applies function to list of arguments

map2(x, y, f) is equivalent to pmap(list(x, y), f).
A data frame is a list, works great with pmap().

params <- tibble::tribble(
  ~n, ~min, ~max,
  1L, 0, 1,
  2L, 10, 100,
)

pmap(params, runif)

#> [[1]]
#> [1] 0.324684
#> 
#> [[2]]
#> [1] 86.14710 73.00569

`purrr::reduce()` combines vector elements with a binary function

“Reduces” a vector to 1 value by repeatedly applying 2-arg function
reduce(1:4, f) is equivalent to f(f(f(1, 2), 3), 4).

Example: Find the numbers that appear in every vector in a list.

set.seed(123)
lst <- map(1:4, ~ sample(1:10, 15, replace = TRUE))
reduce(lst, intersect)

#> [1] 10  5  9

`purrr::accumulate()` shows intermediate results

Like reduce(), but returns all the intermediate results.
Great way to understand how reduce() works.

accumulate(lst, intersect)

#> [[1]]
#>  [1]  3  3 10  2  6  5  4  6  9 10  5  3  9  9  9
#> 
#> [[2]]
#> [1]  3 10  5  4  9
#> 
#> [[3]]
#> [1] 10  5  9
#> 
#> [[4]]
#> [1] 10  5  9

`purrr::accumulate()` is useful for cumulative calculations

accumulate(c(4, 3, 10), `+`)

#> [1]  4  7 17

Predicate functionals apply a predicate to each element

Predicate: function that returns single TRUE or FALSE.

some() / every() / none(): True for any / all / no elements?
detect() / detect_index(): Find value / location of 1st match.
keep() / discard(): Keep / drop all matching elements.

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
str(keep(df, is.numeric))

#> 'data.frame':    3 obs. of  1 variable:
#>  $ x: int  1 2 3

str(discard(df, is.numeric))

#> 'data.frame':    3 obs. of  1 variable:
#>  $ y: chr  "a" "b" "c"

`map_if()` and `modify_if()` transform elements where a predicate is true

E.g, calculate mean of only numeric columns in a data frame.

df <- data.frame(
  num1 = c(0, 10, 20),
  num2 = c(5, 6, 7),
  chr1 = c("a", "b", "c")
)

str(map_if(df, is.numeric, mean))

#> List of 3
#>  $ num1: num 10
#>  $ num2: num 6
#>  $ chr1: chr [1:3] "a" "b" "c"

str(modify_if(df, is.numeric, mean))

#> 'data.frame':    3 obs. of  3 variables:
#>  $ num1: num  10 10 10
#>  $ num2: num  6 6 6
#>  $ chr1: chr  "a" "b" "c"

`base::apply()` summarizes matrices and arrays

Collapses 1 or more matrix/array dimensions by applying a summary function.

apply(X, MARGIN, FUN): MARGIN? 1 for rows, 2 for columns.

a2d <- matrix(1:20, nrow = 5)
# Row means
apply(a2d, 1, mean)

#> [1]  8.5  9.5 10.5 11.5 12.5

# Column means
apply(a2d, 2, mean)

#> [1]  3  8 13 18

Warning: apply() will coerce df to a matrix!

Base R has mathematical functionals

Base R includes several mathematical functionals.

integrate(): Find the area under a curve.
uniroot(): Find where a function equals zero.
optimise(): Find the minimum or maximum value of a function.

integrate(sin, 0, pi)

#> 2 with absolute error < 2.2e-14

Functionals

Learning objectives

R is a functional language at heart

Functional programming languages have first-class functions

Pure functions depend only on their inputs

The “functional style” decomposes a big problem into smaller pieces

Key functional techniques

A functional takes a function as input and returns a vector as output

Functionals are better than for loops

purrr::map() applies a function to each element of a vector

Use map_<type>() to return an atomic vector

Use map_<type>() to return an atomic vector (cont.)

Use anonymous functions for concise operations

Pass additional arguments to map() with ... or via anonymous function

purrr style: pipe simple steps together

purrr::modify() returns the same type as the input

purrr::map2() iterates over two vectors in parallel

purrr::walk() is for functions called for their side effects

purrr::imap() iterates over values and indices

purrr::pmap() iterates over multiple arguments in a list

purrr::reduce() combines vector elements with a binary function

purrr::accumulate() shows intermediate results

purrr::accumulate() is useful for cumulative calculations