+ - 0:00:00
Notes for current slide
Notes for next slide

Advanced R Chapter 20: Evaluation

Daryn Ramsden

thisisdaryn at gmail dot com

last updated: 2020-08-06

1 / 38

Evaluation: What is this chapter about?

  • Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.

  • The fact that we are customising the environments means all the forms of evaluation we go over are non-standard

  • Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation

  • To do Tidy Evaluation, we make use of functions in the rlang package

2 / 38

The Basics: base::eval()

eval: evaluates an input expression in an input environment.

Arguments to eval:

  • expr: the expression you want to evaluate

  • envir: the environment you want to evaluate it in

x <- 10
eval(rlang::expr(x))
[1] 10
y <- 2
eval(expr(x + y), env(x = 1000))
[1] 1002
3 / 38

Something to observe

The first argument of eval is evaluated not quoted.

eval(print(x + 1), env(x = 1000))
[1] 11
[1] 11
eval(expr(print(x + 1)), env(x = 1000))
[1] 1001
4 / 38

Application: local()

local(): a base R function that allows you to carry out a series of steps in new environment.

  • common use case: carrying out a multi-step computation and disposing of intermediate data automatically
not_foo <- local({
x <- 10
y <- 200
x + y
})
5 / 38

Replicating local using eval

We can replicate local:

local2 <- function(expr) {
env <- env(caller_env())
eval(enexpr(expr), env)
}
foo <- local2({
x <- 10
y <- 200
x + y
})
foo
[1] 210
6 / 38

Application: replicating source

Similarly, we can replicate source:

source2 <- function(path, env = caller_env()) {
file <- paste(readLines(path, warn = FALSE), collapse = "\n")
exprs <- parse_exprs(file)
res <- NULL
for (i in seq_along(exprs)) {
res <- eval(exprs[[i]], env)
}
invisible(res)
}
source3 <- function(file, env = parent.frame()) {
lines <- parse(file) # creates an expression vector
res <- eval(lines, envir = env)
invisible(res)
}
7 / 38

Quosures: what are they?

Encapsulation of:

  • expression

  • environment

Coupling is so important that rlang provides a composite structure

8 / 38

Quosures: How do you make them?

rlang provides 3 ways:

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

  1. enquo() and enquos()

    • this is the best way
9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

  1. enquo() and enquos()

    • this is the best way
  2. rlang::quo() and rlang::quos()

    • these exist to match expr() and exprs().
    • You probably don't need this
9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

  1. enquo() and enquos()

    • this is the best way
  2. rlang::quo() and rlang::quos()

    • these exist to match expr() and exprs().
    • You probably don't need this
  3. new_quosure()

    • useful for learning
    • You probably don't need this either
9 / 38

Examples creating quosures

quosure_create <- function(x) enquo(x)
quosure_create(a + b)
<quosure>
expr: ^a + b
env: global
quosures_create <- function(x)enquos(x)
quosures_create(list(x = x ^ 2, y = y ^ 3, z = z ^ 4))
<list_of<quosure>>
[[1]]
<quosure>
expr: ^list(x = x^2, y = y^3, z = z^4)
env: global
10 / 38

Creating quosures using quo and new_quosure

Using quo

quo(x + y + z)
<quosure>
expr: ^x + y + z
env: global

Using new_quosure

new_quosure(expr(x + y), env(x = 1, y = 10))
<quosure>
expr: ^x + y
env: 0x7fc82bc14ac8
11 / 38

Quosures: Under the hood

super_quosure <- new_quosure(expr(x + y + z))
class(super_quosure) # are subclasses of formulas
[1] "quosure" "formula"
is_call(super_quosure) # are call objects
[1] TRUE
attr(super_quosure, ".Environment") #have a .Environment attribute
<environment: R_GlobalEnv>
get_expr(super_quosure) # an expression can be extracted
x + y + z
get_env(super_quosure) # an environment can be extracted
<environment: R_GlobalEnv>
12 / 38

Tidy evaluation: What we really came for

A form of NSE utilizing 3 main features:

  • quasiquotation

    • talked about this last week
  • quosures
  • data masks

    • soon ...
13 / 38

eval_tidy(): the function that does the work

eval_tidy takes two arguments:

  1. a quosure

  2. a data mask (data frame): first place to look for variable definitions

Example: using eval_tidy to find the largest penguin (mass) in palmerpenguins::penguins:

library(palmerpenguins)
penguin_quosure <- quosure_create(max(body_mass_g, na.rm = TRUE))
# Now use the penguins data frame as data mask
eval_tidy(penguin_quosure, penguins)
[1] 6300
14 / 38

Example: replicating with

How with works:

library(palmerpenguins)
with(penguins, mean(body_mass_g, na.rm = TRUE))
[1] 4201.754

A new version of with:

with2 <- function(data, expr) {
expr <- enquo(expr)
eval_tidy(expr, data)
}
with2(penguins, mean(body_mass_g, na.rm = TRUE))
[1] 4201.754
15 / 38

Another example: replicating subset

subset2 <- function(data, rows) {
rows <- enquo(rows)
rows_val <- eval_tidy(rows, data)
stopifnot(is.logical(rows_val))
data[rows_val, , drop = FALSE]
}
df <- subset2(penguins, species == "Adelie")
table(df$species)
Adelie Chinstrap Gentoo
152 0 0
16 / 38

Using pronouns to avoid ambiguity

The data mask provides two pronouns: .data and .env.

  • .data$x always refers to x in the data mask.
  • .env$x always refers to x in the environmen
x <- 1
df <- data.frame(x = 2)
with2(df, .data$x)
[1] 2
with2(df, .env$x)
[1] 1

There's no reason that should work. But it does and can be used to avoid ambiguity.

17 / 38

Why does that work?

  • .data and .env are actually exported from rlang

    • .data retrieves data-variables from the data frame

    • .env retrieves env-variables from the enviroment

  • They are not real data frames: they just act like them sometimes

    • you can't take do names(.data) or map over it.
18 / 38

When is tidy evaluation actually beneficial?

A practical example:

resample <- function(df, n) {
idx <- sample(nrow(df), n, replace = TRUE)
df[idx, , drop = FALSE]
}

You want to create a new function that resamples and subsamples in a single step

19 / 38

Example continued:

An approach that does not work:

subsample <- function(df, cond, n = nrow(df)) {
df <- subset2(df, cond)
resample(df, n)
}
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
subsample(df, x == 1)
x y
2 1 2
3 1 3
1 1 1
3.1 1 3
5 2 5

subsample() doesn’t quote any arguments so cond is evaluated normally (not in a data mask), and we get an error when it tries to find a binding for x.

20 / 38

Example continued

An approach that does work

subsample <- function(df, cond, n = nrow(df)) {
cond <- enquo(cond)
df <- subset2(df, !!cond)
resample(df, n)
}
subsample(df, x == 1)
x y
1 1 1
2 1 2
1.1 1 1
21 / 38

Tidy evaluation handles ambiguity well

Consider the function that is meant to find all the rows of df where x is at least some threshold value:

threshold_x <- function(df, val) {
subset2(df, x >= val)
}

How can this go wrong?

  • if val is in df

  • if x is in the calling environment but not in df

22 / 38

A more robust implementation

Here's a better implementation:

threshold_x <- function(df, val) {
subset2(df, .data$x >= .env$val)
}

Case 1: x is in the calling environment but not in df

no_x <- data.frame(y = 1:3)
x <- 10
threshold_x(no_x, 2)
Error: Column `x` not found in `.data`

Case 2: If val is in df

has_val <- data.frame(x = 1:3, val = 9:11)
threshold_x(has_val, 2)
x val
2 2 10
3 3 11
23 / 38

NSE in base R

Two common patterns for NSE in base R:

  1. substitute() and evaluation in the caller environment using eval()

  2. match.call(), call manipulation, and evaluation in the caller environment

24 / 38

NSE Using substitute

substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.

eval: evaluates an R expression. Its arguments are:

  • expr: an object to be evaluated.

  • envir: the environment in which expr is to be evaluated. May also be NULL, a list, a data frame, a pairlist or an integer as specified to sys.call.

  • enclos: Relevant when envir is a (pair)list or a data frame. Specifies the enclosure, i.e., where R looks for objects not found in envir.

25 / 38

NSE in base example:

How subset is used:

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))
subset(sample_df, a >= 4)
a b c
4 4 2 4
5 5 1 1

How subset is implemented in base

subset_base <- function(data, rows) {
rows <- substitute(rows)
rows_val <- eval(rows, data, caller_env())
stopifnot(is.logical(rows_val))
data[rows_val, , drop = FALSE]
}
26 / 38

Problems with base::subset

  1. always evaluates rows in the calling environment, but if ... has been used, then the expression might need to be evaluated elsewhere.

    • this means you cannot reliably work with functionals like map() or lapply()
  2. Calling subset() from another function requires some care: you have to use substitute() to capture a call to subset() complete expression, and then evaluate

  3. eval() doesn’t provide any pronouns so there’s no way to require part of the expression to come from the data.

27 / 38

Alternative to subset: Using tidy evaluation

subset_tidy <- function(data, rows) {
rows <- enquo(rows)
rows_val <- eval_tidy(rows, data)
stopifnot(is.logical(rows_val))
data[rows_val, , drop = FALSE]
}
28 / 38

NSE with match.call: Background

match.call: returns a call in which all of the specified arguments are specified by their full names.

my_func <- function(input1, input2){
match.call()
}
my_func(input1 = 1)
my_func(input1 = 1)
my_func(input2 = 1, 2)
my_func(input1 = 2, input2 = 1)
my_func(input3 = 2)
Error in my_func(input3 = 2): unused argument (input3 = 2)
29 / 38

Using match.call in NSE

Steps in using match.call to do NSE:

  1. Capture the complete call

  2. Modify it

  3. Evaluate the results


Example: write.csv

write.csv <- function(...) {
call <- match.call(write.table, expand.dots = TRUE)
call[[1]] <- quote(write.table)
call$sep <- ","
call$dec <- "."
eval(call, parent.frame())
}
30 / 38

Alternate implementation of write.csv

It could have been done like this:

write.csv <- function(...) {
write.table(..., sep = ",", dec = ".")
}
31 / 38

Wrapping modeling functions

Simplest possible wrapper:

lm2 <- function(formula, data) {
lm(formula, data)
}
lm2(bill_length_mm ~ body_mass_g, penguins)
Call:
lm(formula = formula, data = data)
Coefficients:
(Intercept) body_mass_g
26.898872 0.004051
32 / 38

A better wrapper function

lm3 <- function(formula, data, env = caller_env()) {
formula <- enexpr(formula)
data <- enexpr(data)
lm_call <- expr(lm(!!formula, data = !!data))
expr_print(lm_call)
eval(lm_call, env)
}
lm3(bill_length_mm ~ body_mass_g, penguins)
lm(bill_length_mm ~ body_mass_g, data = penguins)
Call:
lm(formula = bill_length_mm ~ body_mass_g, data = penguins)
Coefficients:
(Intercept) body_mass_g
26.898872 0.004051
33 / 38

Things to note

There are 3 key steps:

  1. capture the unevaluated arguments using enexpr(), and capture the caller environment using caller_env().

  2. generate a new expression using expr() and unquoting.

  3. evaluate that expression in the caller environment.

Nice side-effect: Unquoting can be used to generate formulas

y <- expr(bill_length_mm)
x1 <- expr(body_mass_g)
x2 <- expr(species)
lm3(!!y ~ !!x1 + !!x2, penguins)
lm(bill_length_mm ~ body_mass_g + species, data = penguins)
Call:
lm(formula = bill_length_mm ~ body_mass_g + species, data = penguins)
Coefficients:
(Intercept) body_mass_g speciesChinstrap speciesGentoo
24.919471 0.003748 9.920884 3.557978
34 / 38

Potential problem situation: Mingling objects

Problem: What if you want a function that resamples before training the model?

Something that doesn't work:

resample_lm0 <- function(formula, data, env = caller_env()) {
formula <- enexpr(formula)
resample_data <- resample(data, n = nrow(data))
lm_call <- expr(lm(!!formula, data = resample_data))
expr_print(lm_call)
eval(lm_call, env)
}
df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
resample_lm0(y ~ x, data = df)
lm(y ~ x, data = resample_data)
Error in is.data.frame(data): object 'resample_data' not found

lm_call and resample_data are in different environments.

35 / 38

Example: Approach 1

Unquote the data frame into the call:

df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
resample_lm1 <- function(formula, data, env = caller_env()) {
formula <- enexpr(formula)
resample_data <- resample(data, n = nrow(data))
lm_call <- expr(lm(!!formula, data = !!resample_data))
expr_print(lm_call)
eval(lm_call, env)
}
resample_lm1(y ~ x, data = df)$call
lm(y ~ x, data = <df[,2]>)
lm(formula = y ~ x, data = list(x = c(8L, 4L, 5L, 3L, 9L, 7L,
8L, 6L, 1L, 10L), y = c(28.19, 16.37, 18.49, 13.93, 30.69, 25.79,
28.19, 23.18, 7.81, 36.09)))
36 / 38

Example continued: a cleaner approach

A cleaner approach:

  1. create a new environment that inherits from the caller

  2. bind variables that you’ve created inside the function to that environment.

resample_lm2 <- function(formula, data, env = caller_env()) {
formula <- enexpr(formula)
resample_data <- resample(data, n = nrow(data))
lm_env <- env(env, resample_data = resample_data)
lm_call <- expr(lm(!!formula, data = resample_data))
expr_print(lm_call)
eval(lm_call, lm_env)
}
resample_lm2(y ~ x, data = df)
lm(y ~ x, data = resample_data)
Call:
lm(formula = y ~ x, data = resample_data)
Coefficients:
(Intercept) x
4.554 2.922
37 / 38

Abrupt ending: Overall takeaways

  • There are many ways to do non-standard evaluation

  • Tidy evaluation is a good framework for applying NSE

38 / 38

Evaluation: What is this chapter about?

  • Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.

  • The fact that we are customising the environments means all the forms of evaluation we go over are non-standard

  • Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation

  • To do Tidy Evaluation, we make use of functions in the rlang package

2 / 38
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow