Advanced R Chapter 20: Evaluation

Advanced R Chapter 20: EvaluationDaryn Ramsdenthisisdaryn at gmail dot comlast updated: 2020-08-061 / 38

Evaluation: What is this chapter about?

Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.
The fact that we are customising the environments means all the forms of evaluation we go over are non-standard
Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation
To do Tidy Evaluation, we make use of functions in the rlang package

2 / 38

The Basics: `base::eval()`

eval: evaluates an input expression in an input environment.

Arguments to eval:

expr: the expression you want to evaluate
envir: the environment you want to evaluate it in

x <- 10
eval(rlang::expr(x))

[1] 10

y <- 2
eval(expr(x + y), env(x = 1000))

[1] 1002

3 / 38

Something to observe

The first argument of eval is evaluated not quoted.

eval(print(x + 1), env(x = 1000))

[1] 11

[1] 11

eval(expr(print(x + 1)), env(x = 1000))

[1] 1001

4 / 38

Application: `local()`

local(): a base R function that allows you to carry out a series of steps in new environment.

common use case: carrying out a multi-step computation and disposing of intermediate data automatically

not_foo <- local({
  x <- 10
  y <- 200
  x + y
})

5 / 38

Replicating `local` using `eval`

We can replicate local:

local2 <- function(expr) {
  env <- env(caller_env())
  eval(enexpr(expr), env)
}
foo <- local2({
  x <- 10
  y <- 200
  x + y
})
foo

[1] 210

6 / 38

Application: replicating `source`

Similarly, we can replicate source:

source2 <- function(path, env = caller_env()) {
  file <- paste(readLines(path, warn = FALSE), collapse = "\n")
  exprs <- parse_exprs(file)
  res <- NULL
  for (i in seq_along(exprs)) {
    res <- eval(exprs[[i]], env)
  }
  invisible(res)
}

source3 <- function(file, env = parent.frame()) {
  lines <- parse(file) # creates an expression vector
  res <- eval(lines, envir = env)
  invisible(res)
}

7 / 38

Quosures: what are they?

Encapsulation of:

expression
environment

Coupling is so important that rlang provides a composite structure

8 / 38

Quosures: How do you make them?

rlang provides 3 ways:

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

enquo() and enquos()
- this is the best way

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

enquo() and enquos()
- this is the best way
rlang::quo() and rlang::quos()
- these exist to match expr() and exprs().
- You probably don't need this

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

enquo() and enquos()
- this is the best way
rlang::quo() and rlang::quos()
- these exist to match expr() and exprs().
- You probably don't need this
new_quosure()
- useful for learning
- You probably don't need this either

9 / 38

Examples creating quosures

quosure_create <- function(x) enquo(x)
quosure_create(a + b)

<quosure>
expr: ^a + b
env:  global

quosures_create <- function(x)enquos(x)
quosures_create(list(x = x ^ 2, y = y ^ 3, z = z ^ 4))

<list_of<quosure>>
[[1]]
<quosure>
expr: ^list(x = x^2, y = y^3, z = z^4)
env:  global

10 / 38

Creating quosures using `quo` and `new_quosure`

Using `quo`

quo(x + y + z)

<quosure>
expr: ^x + y + z
env:  global

Using `new_quosure`

new_quosure(expr(x + y), env(x = 1, y = 10))

<quosure>
expr: ^x + y
env:  0x7fc82bc14ac8

11 / 38

Quosures: Under the hood

super_quosure <- new_quosure(expr(x + y + z))
class(super_quosure) # are subclasses of formulas

[1] "quosure" "formula"

is_call(super_quosure) # are call objects

[1] TRUE

attr(super_quosure, ".Environment") #have a .Environment attribute

<environment: R_GlobalEnv>

get_expr(super_quosure) # an expression can be extracted

x + y + z

get_env(super_quosure) # an environment can be extracted

<environment: R_GlobalEnv>

12 / 38

Tidy evaluation: What we really came for

A form of NSE utilizing 3 main features:

quasiquotation
- talked about this last week

quosures

data masks
- soon ...

13 / 38

`eval_tidy()`: the function that does the work

eval_tidy takes two arguments:

a quosure
a data mask (data frame): first place to look for variable definitions

Example: using eval_tidy to find the largest penguin (mass) in palmerpenguins::penguins:

library(palmerpenguins)
penguin_quosure <- quosure_create(max(body_mass_g, na.rm = TRUE))
# Now use the penguins data frame as data mask
eval_tidy(penguin_quosure, penguins)

[1] 6300

14 / 38

Example: replicating `with`

How with works:

library(palmerpenguins)
with(penguins, mean(body_mass_g, na.rm = TRUE))

[1] 4201.754

A new version of with:

with2 <- function(data, expr) {
  expr <- enquo(expr)
  eval_tidy(expr, data)
}

with2(penguins, mean(body_mass_g, na.rm = TRUE))

[1] 4201.754

15 / 38

Another example: replicating `subset`

subset2 <- function(data, rows) {
  rows <- enquo(rows)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}
df <- subset2(penguins, species == "Adelie")
table(df$species)


   Adelie Chinstrap    Gentoo 
      152         0         0

16 / 38

Using pronouns to avoid ambiguity

The data mask provides two pronouns: .data and .env.

.data$x always refers to x in the data mask.
.env$x always refers to x in the environmen

x <- 1
df <- data.frame(x = 2)

with2(df, .data$x)

[1] 2

with2(df, .env$x)

[1] 1

There's no reason that should work. But it does and can be used to avoid ambiguity.

17 / 38

Why does that work?

.data and .env are actually exported from rlang
- .data retrieves data-variables from the data frame
- .env retrieves env-variables from the enviroment
They are not real data frames: they just act like them sometimes
- you can't take do names(.data) or map over it.

18 / 38

When is tidy evaluation actually beneficial?

A practical example:

resample <- function(df, n) {
  idx <- sample(nrow(df), n, replace = TRUE)
  df[idx, , drop = FALSE]
}

You want to create a new function that resamples and subsamples in a single step

19 / 38

Example continued:

An approach that does not work:

subsample <- function(df, cond, n = nrow(df)) {
  df <- subset2(df, cond)
  resample(df, n)
}
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
subsample(df, x == 1)

subsample() doesn’t quote any arguments so cond is evaluated normally (not in a data mask), and we get an error when it tries to find a binding for x.

20 / 38

Example continued

An approach that does work

subsample <- function(df, cond, n = nrow(df)) {
  cond <- enquo(cond)
  df <- subset2(df, !!cond)
  resample(df, n)
}
subsample(df, x == 1)

21 / 38

Tidy evaluation handles ambiguity well

Consider the function that is meant to find all the rows of df where x is at least some threshold value:

threshold_x <- function(df, val) {
  subset2(df, x >= val)
}

How can this go wrong?

if val is in df
if x is in the calling environment but not in df

22 / 38

A more robust implementation

Here's a better implementation:

threshold_x <- function(df, val) {
  subset2(df, .data$x >= .env$val)
}

Case 1: x is in the calling environment but not in df

no_x <- data.frame(y = 1:3)
x <- 10
threshold_x(no_x, 2)

Error: Column `x` not found in `.data`

Case 2: If val is in df

has_val <- data.frame(x = 1:3, val = 9:11)
threshold_x(has_val, 2)

  x val
2 2  10
3 3  11

23 / 38

NSE in base R

Two common patterns for NSE in base R:

substitute() and evaluation in the caller environment using eval()
match.call(), call manipulation, and evaluation in the caller environment

24 / 38

NSE Using `substitute`

substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.

eval: evaluates an R expression. Its arguments are:

expr: an object to be evaluated.
envir: the environment in which expr is to be evaluated. May also be NULL, a list, a data frame, a pairlist or an integer as specified to sys.call.
enclos: Relevant when envir is a (pair)list or a data frame. Specifies the enclosure, i.e., where R looks for objects not found in envir.

25 / 38

NSE in base example:

How subset is used:

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))
subset(sample_df, a >= 4)

  a b c
4 4 2 4
5 5 1 1

How subset is implemented in base

subset_base <- function(data, rows) {
  rows <- substitute(rows)
  rows_val <- eval(rows, data, caller_env())
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

26 / 38

Problems with `base::subset`

always evaluates rows in the calling environment, but if ... has been used, then the expression might need to be evaluated elsewhere.
- this means you cannot reliably work with functionals like map() or lapply()
Calling subset() from another function requires some care: you have to use substitute() to capture a call to subset() complete expression, and then evaluate
eval() doesn’t provide any pronouns so there’s no way to require part of the expression to come from the data.

27 / 38

Alternative to `subset`: Using tidy evaluation

subset_tidy <- function(data, rows) {
  rows <- enquo(rows)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

28 / 38

NSE with `match.call`: Background

match.call: returns a call in which all of the specified arguments are specified by their full names.

my_func <- function(input1, input2){
  match.call()
}

my_func(input1 = 1)

my_func(input1 = 1)

my_func(input2 = 1, 2)

my_func(input1 = 2, input2 = 1)

my_func(input3 = 2)

Error in my_func(input3 = 2): unused argument (input3 = 2)

29 / 38

Using match.call in NSE

Steps in using match.call to do NSE:

Capture the complete call
Modify it
Evaluate the results

Example: `write.csv`

write.csv <- function(...) {
  call <- match.call(write.table, expand.dots = TRUE)
  call[[1]] <- quote(write.table)
  call$sep <- ","
  call$dec <- "."
  eval(call, parent.frame())
}

30 / 38

Alternate implementation of `write.csv`

It could have been done like this:

write.csv <- function(...) {
  write.table(..., sep = ",", dec = ".")
}

31 / 38

Wrapping modeling functions

Simplest possible wrapper:

lm2 <- function(formula, data) {
  lm(formula, data)
}

lm2(bill_length_mm ~ body_mass_g, penguins)


Call:
lm(formula = formula, data = data)
Coefficients:
(Intercept)  body_mass_g  
  26.898872     0.004051

32 / 38

A better wrapper function

lm3 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  data <- enexpr(data)
  lm_call <- expr(lm(!!formula, data = !!data))
  expr_print(lm_call)
  eval(lm_call, env)
}
lm3(bill_length_mm ~ body_mass_g, penguins)

lm(bill_length_mm ~ body_mass_g, data = penguins)


Call:
lm(formula = bill_length_mm ~ body_mass_g, data = penguins)
Coefficients:
(Intercept)  body_mass_g  
  26.898872     0.004051

33 / 38

Things to note

There are 3 key steps:

capture the unevaluated arguments using enexpr(), and capture the caller environment using caller_env().
generate a new expression using expr() and unquoting.
evaluate that expression in the caller environment.

Nice side-effect: Unquoting can be used to generate formulas

y <- expr(bill_length_mm)
x1 <- expr(body_mass_g)
x2 <- expr(species)
lm3(!!y ~ !!x1 + !!x2, penguins)

lm(bill_length_mm ~ body_mass_g + species, data = penguins)


Call:
lm(formula = bill_length_mm ~ body_mass_g + species, data = penguins)
Coefficients:
     (Intercept)       body_mass_g  speciesChinstrap     speciesGentoo  
       24.919471          0.003748          9.920884          3.557978

34 / 38

Potential problem situation: Mingling objects

Problem: What if you want a function that resamples before training the model?

Something that doesn't work:

resample_lm0 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))
  lm_call <- expr(lm(!!formula, data = resample_data))
  expr_print(lm_call)
  eval(lm_call, env)
}
df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
resample_lm0(y ~ x, data = df)

lm(y ~ x, data = resample_data)

Error in is.data.frame(data): object 'resample_data' not found

lm_call and resample_data are in different environments.

35 / 38

Example: Approach 1

Unquote the data frame into the call:

df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
resample_lm1 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))
  lm_call <- expr(lm(!!formula, data = !!resample_data))
  expr_print(lm_call)
  eval(lm_call, env)
}
resample_lm1(y ~ x, data = df)$call

lm(y ~ x, data = <df[,2]>)

lm(formula = y ~ x, data = list(x = c(8L, 4L, 5L, 3L, 9L, 7L, 
8L, 6L, 1L, 10L), y = c(28.19, 16.37, 18.49, 13.93, 30.69, 25.79, 
28.19, 23.18, 7.81, 36.09)))

36 / 38

Example continued: a cleaner approach

A cleaner approach:

create a new environment that inherits from the caller
bind variables that you’ve created inside the function to that environment.

resample_lm2 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))
  lm_env <- env(env, resample_data = resample_data)
  lm_call <- expr(lm(!!formula, data = resample_data))
  expr_print(lm_call)
  eval(lm_call, lm_env)
}
resample_lm2(y ~ x, data = df)

lm(y ~ x, data = resample_data)


Call:
lm(formula = y ~ x, data = resample_data)
Coefficients:
(Intercept)            x  
      4.554        2.922

37 / 38

Abrupt ending: Overall takeaways

There are many ways to do non-standard evaluation
Tidy evaluation is a good framework for applying NSE

38 / 38

Advanced R Chapter 20: EvaluationDaryn Ramsdenthisisdaryn at gmail dot comlast updated: 2020-08-061 / 38

Evaluation: What is this chapter about?

Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.
The fact that we are customising the environments means all the forms of evaluation we go over are non-standard
Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation
To do Tidy Evaluation, we make use of functions in the rlang package

2 / 38

The Basics: `base::eval()`

eval: evaluates an input expression in an input environment.

Arguments to eval:

expr: the expression you want to evaluate
envir: the environment you want to evaluate it in

x <- 10
eval(rlang::expr(x))

[1] 10

y <- 2
eval(expr(x + y), env(x = 1000))

[1] 1002

3 / 38

Something to observe

The first argument of eval is evaluated not quoted.

eval(print(x + 1), env(x = 1000))

[1] 11

[1] 11

eval(expr(print(x + 1)), env(x = 1000))

[1] 1001

4 / 38

Application: `local()`

local(): a base R function that allows you to carry out a series of steps in new environment.

common use case: carrying out a multi-step computation and disposing of intermediate data automatically

not_foo <- local({
  x <- 10
  y <- 200
  x + y
})

5 / 38

Replicating `local` using `eval`

We can replicate local:

local2 <- function(expr) {
  env <- env(caller_env())
  eval(enexpr(expr), env)
}
foo <- local2({
  x <- 10
  y <- 200
  x + y
})
foo

[1] 210

6 / 38

Application: replicating `source`

Similarly, we can replicate source:

source2 <- function(path, env = caller_env()) {
  file <- paste(readLines(path, warn = FALSE), collapse = "\n")
  exprs <- parse_exprs(file)
  res <- NULL
  for (i in seq_along(exprs)) {
    res <- eval(exprs[[i]], env)
  }
  invisible(res)
}

source3 <- function(file, env = parent.frame()) {
  lines <- parse(file) # creates an expression vector
  res <- eval(lines, envir = env)
  invisible(res)
}

7 / 38

Quosures: what are they?

Encapsulation of:

expression
environment

Coupling is so important that rlang provides a composite structure

8 / 38

Quosures: How do you make them?

rlang provides 3 ways:

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

enquo() and enquos()
- this is the best way

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

enquo() and enquos()
- this is the best way
rlang::quo() and rlang::quos()
- these exist to match expr() and exprs().
- You probably don't need this

9 / 38

Quosures: How do you make them?

rlang provides 3 ways:

enquo() and enquos()
- this is the best way
rlang::quo() and rlang::quos()
- these exist to match expr() and exprs().
- You probably don't need this
new_quosure()
- useful for learning
- You probably don't need this either

9 / 38

Examples creating quosures

quosure_create <- function(x) enquo(x)
quosure_create(a + b)

<quosure>
expr: ^a + b
env:  global

quosures_create <- function(x)enquos(x)
quosures_create(list(x = x ^ 2, y = y ^ 3, z = z ^ 4))

<list_of<quosure>>
[[1]]
<quosure>
expr: ^list(x = x^2, y = y^3, z = z^4)
env:  global

10 / 38

Creating quosures using `quo` and `new_quosure`

Using `quo`

quo(x + y + z)

<quosure>
expr: ^x + y + z
env:  global

Using `new_quosure`

new_quosure(expr(x + y), env(x = 1, y = 10))

<quosure>
expr: ^x + y
env:  0x7fc82bc14ac8

11 / 38

Quosures: Under the hood

super_quosure <- new_quosure(expr(x + y + z))
class(super_quosure) # are subclasses of formulas

[1] "quosure" "formula"

is_call(super_quosure) # are call objects

[1] TRUE

attr(super_quosure, ".Environment") #have a .Environment attribute

<environment: R_GlobalEnv>

get_expr(super_quosure) # an expression can be extracted

x + y + z

get_env(super_quosure) # an environment can be extracted

<environment: R_GlobalEnv>

12 / 38

Tidy evaluation: What we really came for

A form of NSE utilizing 3 main features:

quasiquotation
- talked about this last week

quosures

data masks
- soon ...

13 / 38

`eval_tidy()`: the function that does the work

eval_tidy takes two arguments:

a quosure
a data mask (data frame): first place to look for variable definitions

Example: using eval_tidy to find the largest penguin (mass) in palmerpenguins::penguins:

library(palmerpenguins)
penguin_quosure <- quosure_create(max(body_mass_g, na.rm = TRUE))
# Now use the penguins data frame as data mask
eval_tidy(penguin_quosure, penguins)

[1] 6300

14 / 38

Example: replicating `with`

How with works:

library(palmerpenguins)
with(penguins, mean(body_mass_g, na.rm = TRUE))

[1] 4201.754

A new version of with:

with2 <- function(data, expr) {
  expr <- enquo(expr)
  eval_tidy(expr, data)
}

with2(penguins, mean(body_mass_g, na.rm = TRUE))

[1] 4201.754

15 / 38

Another example: replicating `subset`

subset2 <- function(data, rows) {
  rows <- enquo(rows)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}
df <- subset2(penguins, species == "Adelie")
table(df$species)


   Adelie Chinstrap    Gentoo 
      152         0         0

16 / 38

Using pronouns to avoid ambiguity

The data mask provides two pronouns: .data and .env.

.data$x always refers to x in the data mask.
.env$x always refers to x in the environmen

x <- 1
df <- data.frame(x = 2)

with2(df, .data$x)

[1] 2

with2(df, .env$x)

[1] 1

There's no reason that should work. But it does and can be used to avoid ambiguity.

17 / 38

Why does that work?

.data and .env are actually exported from rlang
- .data retrieves data-variables from the data frame
- .env retrieves env-variables from the enviroment
They are not real data frames: they just act like them sometimes
- you can't take do names(.data) or map over it.

18 / 38

When is tidy evaluation actually beneficial?

A practical example:

resample <- function(df, n) {
  idx <- sample(nrow(df), n, replace = TRUE)
  df[idx, , drop = FALSE]
}

You want to create a new function that resamples and subsamples in a single step

19 / 38

Example continued:

An approach that does not work:

subsample <- function(df, cond, n = nrow(df)) {
  df <- subset2(df, cond)
  resample(df, n)
}
df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)
subsample(df, x == 1)

subsample() doesn’t quote any arguments so cond is evaluated normally (not in a data mask), and we get an error when it tries to find a binding for x.

20 / 38

Example continued

An approach that does work

subsample <- function(df, cond, n = nrow(df)) {
  cond <- enquo(cond)
  df <- subset2(df, !!cond)
  resample(df, n)
}
subsample(df, x == 1)

21 / 38

Tidy evaluation handles ambiguity well

Consider the function that is meant to find all the rows of df where x is at least some threshold value:

threshold_x <- function(df, val) {
  subset2(df, x >= val)
}

How can this go wrong?

if val is in df
if x is in the calling environment but not in df

22 / 38

A more robust implementation

Here's a better implementation:

threshold_x <- function(df, val) {
  subset2(df, .data$x >= .env$val)
}

Case 1: x is in the calling environment but not in df

no_x <- data.frame(y = 1:3)
x <- 10
threshold_x(no_x, 2)

Error: Column `x` not found in `.data`

Case 2: If val is in df

has_val <- data.frame(x = 1:3, val = 9:11)
threshold_x(has_val, 2)

  x val
2 2  10
3 3  11

23 / 38

NSE in base R

Two common patterns for NSE in base R:

substitute() and evaluation in the caller environment using eval()
match.call(), call manipulation, and evaluation in the caller environment

24 / 38

NSE Using `substitute`

substitute returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.

eval: evaluates an R expression. Its arguments are:

expr: an object to be evaluated.
envir: the environment in which expr is to be evaluated. May also be NULL, a list, a data frame, a pairlist or an integer as specified to sys.call.
enclos: Relevant when envir is a (pair)list or a data frame. Specifies the enclosure, i.e., where R looks for objects not found in envir.

25 / 38

NSE in base example:

How subset is used:

sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))
subset(sample_df, a >= 4)

  a b c
4 4 2 4
5 5 1 1

How subset is implemented in base

subset_base <- function(data, rows) {
  rows <- substitute(rows)
  rows_val <- eval(rows, data, caller_env())
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

26 / 38

Problems with `base::subset`

always evaluates rows in the calling environment, but if ... has been used, then the expression might need to be evaluated elsewhere.
- this means you cannot reliably work with functionals like map() or lapply()
Calling subset() from another function requires some care: you have to use substitute() to capture a call to subset() complete expression, and then evaluate
eval() doesn’t provide any pronouns so there’s no way to require part of the expression to come from the data.

27 / 38

Alternative to `subset`: Using tidy evaluation

subset_tidy <- function(data, rows) {
  rows <- enquo(rows)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

28 / 38

NSE with `match.call`: Background

match.call: returns a call in which all of the specified arguments are specified by their full names.

my_func <- function(input1, input2){
  match.call()
}

my_func(input1 = 1)

my_func(input1 = 1)

my_func(input2 = 1, 2)

my_func(input1 = 2, input2 = 1)

my_func(input3 = 2)

Error in my_func(input3 = 2): unused argument (input3 = 2)

29 / 38

Using match.call in NSE

Steps in using match.call to do NSE:

Capture the complete call
Modify it
Evaluate the results

Example: `write.csv`

write.csv <- function(...) {
  call <- match.call(write.table, expand.dots = TRUE)
  call[[1]] <- quote(write.table)
  call$sep <- ","
  call$dec <- "."
  eval(call, parent.frame())
}

30 / 38

Alternate implementation of `write.csv`

It could have been done like this:

write.csv <- function(...) {
  write.table(..., sep = ",", dec = ".")
}

31 / 38

Wrapping modeling functions

Simplest possible wrapper:

lm2 <- function(formula, data) {
  lm(formula, data)
}

lm2(bill_length_mm ~ body_mass_g, penguins)


Call:
lm(formula = formula, data = data)
Coefficients:
(Intercept)  body_mass_g  
  26.898872     0.004051

32 / 38

A better wrapper function

lm3 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  data <- enexpr(data)
  lm_call <- expr(lm(!!formula, data = !!data))
  expr_print(lm_call)
  eval(lm_call, env)
}
lm3(bill_length_mm ~ body_mass_g, penguins)

lm(bill_length_mm ~ body_mass_g, data = penguins)


Call:
lm(formula = bill_length_mm ~ body_mass_g, data = penguins)
Coefficients:
(Intercept)  body_mass_g  
  26.898872     0.004051

33 / 38

Things to note

There are 3 key steps:

capture the unevaluated arguments using enexpr(), and capture the caller environment using caller_env().
generate a new expression using expr() and unquoting.
evaluate that expression in the caller environment.

Nice side-effect: Unquoting can be used to generate formulas

y <- expr(bill_length_mm)
x1 <- expr(body_mass_g)
x2 <- expr(species)
lm3(!!y ~ !!x1 + !!x2, penguins)

lm(bill_length_mm ~ body_mass_g + species, data = penguins)


Call:
lm(formula = bill_length_mm ~ body_mass_g + species, data = penguins)
Coefficients:
     (Intercept)       body_mass_g  speciesChinstrap     speciesGentoo  
       24.919471          0.003748          9.920884          3.557978

34 / 38

Potential problem situation: Mingling objects

Problem: What if you want a function that resamples before training the model?

Something that doesn't work:

resample_lm0 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))
  lm_call <- expr(lm(!!formula, data = resample_data))
  expr_print(lm_call)
  eval(lm_call, env)
}
df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
resample_lm0(y ~ x, data = df)

lm(y ~ x, data = resample_data)

Error in is.data.frame(data): object 'resample_data' not found

lm_call and resample_data are in different environments.

35 / 38

Example: Approach 1

Unquote the data frame into the call:

df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
resample_lm1 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))
  lm_call <- expr(lm(!!formula, data = !!resample_data))
  expr_print(lm_call)
  eval(lm_call, env)
}
resample_lm1(y ~ x, data = df)$call

lm(y ~ x, data = <df[,2]>)

lm(formula = y ~ x, data = list(x = c(8L, 4L, 5L, 3L, 9L, 7L, 
8L, 6L, 1L, 10L), y = c(28.19, 16.37, 18.49, 13.93, 30.69, 25.79, 
28.19, 23.18, 7.81, 36.09)))

36 / 38

Example continued: a cleaner approach

A cleaner approach:

create a new environment that inherits from the caller
bind variables that you’ve created inside the function to that environment.

resample_lm2 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))
  lm_env <- env(env, resample_data = resample_data)
  lm_call <- expr(lm(!!formula, data = resample_data))
  expr_print(lm_call)
  eval(lm_call, lm_env)
}
resample_lm2(y ~ x, data = df)

lm(y ~ x, data = resample_data)


Call:
lm(formula = y ~ x, data = resample_data)
Coefficients:
(Intercept)            x  
      4.554        2.922

37 / 38

Abrupt ending: Overall takeaways

There are many ways to do non-standard evaluation
Tidy evaluation is a good framework for applying NSE

38 / 38

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

Advanced R Chapter 20: Evaluation

Daryn Ramsden

thisisdaryn at gmail dot com

last updated: 2020-08-06

Evaluation: What is this chapter about?

The Basics: base::eval()

Something to observe

Application: local()

Replicating local using eval

Application: replicating source

Quosures: what are they?

Quosures: How do you make them?

Quosures: How do you make them?

Quosures: How do you make them?

Quosures: How do you make them?

Examples creating quosures

Creating quosures using quo and new_quosure

Using quo

Using new_quosure

Quosures: Under the hood

Tidy evaluation: What we really came for

eval_tidy(): the function that does the work

Example: replicating with

Another example: replicating subset

Using pronouns to avoid ambiguity

Why does that work?

When is tidy evaluation actually beneficial?

Example continued:

Example continued

Tidy evaluation handles ambiguity well

A more robust implementation

NSE in base R

NSE Using substitute

NSE in base example:

Problems with base::subset

Alternative to subset: Using tidy evaluation

NSE with match.call: Background

Using match.call in NSE

Example: write.csv

Alternate implementation of write.csv

Wrapping modeling functions

A better wrapper function

Things to note

Potential problem situation: Mingling objects

Example: Approach 1

Example continued: a cleaner approach

Abrupt ending: Overall takeaways

Evaluation: What is this chapter about?

Help

Advanced R Chapter 20: Evaluation

Advanced R Chapter 20: Evaluation

Daryn Ramsden

thisisdaryn at gmail dot com

last updated: 2020-08-06

Evaluation: What is this chapter about?

The Basics: base::eval()

Something to observe

Application: local()

Replicating local using eval

Application: replicating source

Quosures: what are they?

Quosures: How do you make them?

Quosures: How do you make them?

Quosures: How do you make them?

Quosures: How do you make them?

Examples creating quosures

Creating quosures using quo and new_quosure

Using quo

Using new_quosure

Quosures: Under the hood

Tidy evaluation: What we really came for

eval_tidy(): the function that does the work

Example: replicating with

Another example: replicating subset

Using pronouns to avoid ambiguity

Why does that work?

When is tidy evaluation actually beneficial?

Example continued:

Example continued

Tidy evaluation handles ambiguity well

The Basics: `base::eval()`

Application: `local()`

Replicating `local` using `eval`

Application: replicating `source`

Creating quosures using `quo` and `new_quosure`

Using `quo`

Using `new_quosure`

`eval_tidy()`: the function that does the work

Example: replicating `with`

Another example: replicating `subset`

NSE Using `substitute`

Problems with `base::subset`

Alternative to `subset`: Using tidy evaluation

NSE with `match.call`: Background

Example: `write.csv`

Alternate implementation of `write.csv`

The Basics: `base::eval()`

Application: `local()`

Replicating `local` using `eval`

Application: replicating `source`

Creating quosures using `quo` and `new_quosure`

Using `quo`

Using `new_quosure`

`eval_tidy()`: the function that does the work

Example: replicating `with`

Another example: replicating `subset`

NSE Using `substitute`

Problems with `base::subset`

Alternative to `subset`: Using tidy evaluation

NSE with `match.call`: Background

Example: `write.csv`

Alternate implementation of `write.csv`