Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.
The fact that we are customising the environments means all the forms of evaluation we go over are non-standard
Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation
To do Tidy Evaluation, we make use of functions in the rlang
package
base::eval()
eval
: evaluates an input expression in an input environment.
Arguments to eval
:
expr: the expression you want to evaluate
envir: the environment you want to evaluate it in
x <- 10eval(rlang::expr(x))
[1] 10
y <- 2eval(expr(x + y), env(x = 1000))
[1] 1002
The first argument of eval
is evaluated not quoted.
eval(print(x + 1), env(x = 1000))
[1] 11
[1] 11
eval(expr(print(x + 1)), env(x = 1000))
[1] 1001
local()
local()
: a base R function that allows you to carry out a series of steps in new environment.
not_foo <- local({ x <- 10 y <- 200 x + y})
local
using eval
We can replicate local
:
local2 <- function(expr) { env <- env(caller_env()) eval(enexpr(expr), env)}foo <- local2({ x <- 10 y <- 200 x + y})foo
[1] 210
source
Similarly, we can replicate source
:
source2 <- function(path, env = caller_env()) { file <- paste(readLines(path, warn = FALSE), collapse = "\n") exprs <- parse_exprs(file) res <- NULL for (i in seq_along(exprs)) { res <- eval(exprs[[i]], env) } invisible(res)}
source3 <- function(file, env = parent.frame()) { lines <- parse(file) # creates an expression vector res <- eval(lines, envir = env) invisible(res)}
Encapsulation of:
expression
environment
Coupling is so important that rlang
provides a composite structure
rlang
provides 3 ways:
rlang
provides 3 ways:
enquo()
and enquos()
rlang
provides 3 ways:
enquo()
and enquos()
rlang::quo()
and rlang::quos()
expr()
and exprs()
. rlang
provides 3 ways:
enquo()
and enquos()
rlang::quo()
and rlang::quos()
expr()
and exprs()
. new_quosure()
quosure_create <- function(x) enquo(x)quosure_create(a + b)
<quosure>expr: ^a + benv: global
quosures_create <- function(x)enquos(x)quosures_create(list(x = x ^ 2, y = y ^ 3, z = z ^ 4))
<list_of<quosure>>[[1]]<quosure>expr: ^list(x = x^2, y = y^3, z = z^4)env: global
quo
and new_quosure
quo
quo(x + y + z)
<quosure>expr: ^x + y + zenv: global
new_quosure
new_quosure(expr(x + y), env(x = 1, y = 10))
<quosure>expr: ^x + yenv: 0x7fc82bc14ac8
super_quosure <- new_quosure(expr(x + y + z))class(super_quosure) # are subclasses of formulas
[1] "quosure" "formula"
is_call(super_quosure) # are call objects
[1] TRUE
attr(super_quosure, ".Environment") #have a .Environment attribute
<environment: R_GlobalEnv>
get_expr(super_quosure) # an expression can be extracted
x + y + z
get_env(super_quosure) # an environment can be extracted
<environment: R_GlobalEnv>
A form of NSE utilizing 3 main features:
quasiquotation
data masks
eval_tidy()
: the function that does the workeval_tidy
takes two arguments:
a quosure
a data mask (data frame): first place to look for variable definitions
Example: using eval_tidy
to find the largest penguin (mass) in palmerpenguins::penguins
:
library(palmerpenguins)penguin_quosure <- quosure_create(max(body_mass_g, na.rm = TRUE))# Now use the penguins data frame as data maskeval_tidy(penguin_quosure, penguins)
[1] 6300
with
How with
works:
library(palmerpenguins)with(penguins, mean(body_mass_g, na.rm = TRUE))
[1] 4201.754
A new version of with
:
with2 <- function(data, expr) { expr <- enquo(expr) eval_tidy(expr, data)}
with2(penguins, mean(body_mass_g, na.rm = TRUE))
[1] 4201.754
subset
subset2 <- function(data, rows) { rows <- enquo(rows) rows_val <- eval_tidy(rows, data) stopifnot(is.logical(rows_val)) data[rows_val, , drop = FALSE]}df <- subset2(penguins, species == "Adelie")table(df$species)
Adelie Chinstrap Gentoo 152 0 0
The data mask provides two pronouns: .data and .env.
x <- 1df <- data.frame(x = 2)
with2(df, .data$x)
[1] 2
with2(df, .env$x)
[1] 1
There's no reason that should work. But it does and can be used to avoid ambiguity.
.data
and .env
are actually exported from rlang
.data
retrieves data-variables from the data frame
.env
retrieves env-variables from the enviroment
They are not real data frames: they just act like them sometimes
names(.data)
or map
over it. A practical example:
resample <- function(df, n) { idx <- sample(nrow(df), n, replace = TRUE) df[idx, , drop = FALSE]}
You want to create a new function that resamples and subsamples in a single step
An approach that does not work:
subsample <- function(df, cond, n = nrow(df)) { df <- subset2(df, cond) resample(df, n)}df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)subsample(df, x == 1)
x y2 1 23 1 31 1 13.1 1 35 2 5
subsample()
doesn’t quote any arguments so cond is evaluated normally (not in a data mask), and we get an error when it tries to find a binding for x.
An approach that does work
subsample <- function(df, cond, n = nrow(df)) { cond <- enquo(cond) df <- subset2(df, !!cond) resample(df, n)}subsample(df, x == 1)
x y1 1 12 1 21.1 1 1
Consider the function that is meant to find all the rows of df where x is at least some threshold value:
threshold_x <- function(df, val) { subset2(df, x >= val)}
How can this go wrong?
if val
is in df
if x
is in the calling environment but not in df
Here's a better implementation:
threshold_x <- function(df, val) { subset2(df, .data$x >= .env$val)}
Case 1: x
is in the calling environment but not in df
no_x <- data.frame(y = 1:3)x <- 10threshold_x(no_x, 2)
Error: Column `x` not found in `.data`
Case 2: If val
is in df
has_val <- data.frame(x = 1:3, val = 9:11)threshold_x(has_val, 2)
x val2 2 103 3 11
Two common patterns for NSE in base R:
substitute()
and evaluation in the caller environment using eval()
match.call()
, call manipulation, and evaluation in the caller environment
substitute
substitute
returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.
eval
: evaluates an R expression. Its arguments are:
expr: an object to be evaluated.
envir: the environment in which expr is to be evaluated. May also be NULL, a list, a data frame, a pairlist or an integer as specified to sys.call.
enclos: Relevant when envir is a (pair)list or a data frame. Specifies the enclosure, i.e., where R looks for objects not found in envir.
How subset
is used:
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))subset(sample_df, a >= 4)
a b c4 4 2 45 5 1 1
How subset
is implemented in base
subset_base <- function(data, rows) { rows <- substitute(rows) rows_val <- eval(rows, data, caller_env()) stopifnot(is.logical(rows_val)) data[rows_val, , drop = FALSE]}
base::subset
always evaluates rows in the calling environment, but if ... has been used, then the expression might need to be evaluated elsewhere.
map()
or lapply()
Calling subset()
from another function requires some care: you have to use substitute() to capture a call to subset() complete expression, and then evaluate
eval()
doesn’t provide any pronouns so there’s no way to require part of the expression to come from the data.
subset
: Using tidy evaluationsubset_tidy <- function(data, rows) { rows <- enquo(rows) rows_val <- eval_tidy(rows, data) stopifnot(is.logical(rows_val)) data[rows_val, , drop = FALSE]}
match.call
: Backgroundmatch.call
: returns a call in which all of the specified arguments are specified by their full names.
my_func <- function(input1, input2){ match.call()}
my_func(input1 = 1)
my_func(input1 = 1)
my_func(input2 = 1, 2)
my_func(input1 = 2, input2 = 1)
my_func(input3 = 2)
Error in my_func(input3 = 2): unused argument (input3 = 2)
Steps in using match.call
to do NSE:
Capture the complete call
Modify it
Evaluate the results
write.csv
write.csv <- function(...) { call <- match.call(write.table, expand.dots = TRUE) call[[1]] <- quote(write.table) call$sep <- "," call$dec <- "." eval(call, parent.frame())}
write.csv
It could have been done like this:
write.csv <- function(...) { write.table(..., sep = ",", dec = ".")}
Simplest possible wrapper:
lm2 <- function(formula, data) { lm(formula, data)}
lm2(bill_length_mm ~ body_mass_g, penguins)
Call:lm(formula = formula, data = data)Coefficients:(Intercept) body_mass_g 26.898872 0.004051
lm3 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) data <- enexpr(data) lm_call <- expr(lm(!!formula, data = !!data)) expr_print(lm_call) eval(lm_call, env)}lm3(bill_length_mm ~ body_mass_g, penguins)
lm(bill_length_mm ~ body_mass_g, data = penguins)
Call:lm(formula = bill_length_mm ~ body_mass_g, data = penguins)Coefficients:(Intercept) body_mass_g 26.898872 0.004051
There are 3 key steps:
capture the unevaluated arguments using enexpr()
, and capture the caller environment using caller_env()
.
generate a new expression using expr()
and unquoting.
evaluate that expression in the caller environment.
Nice side-effect: Unquoting can be used to generate formulas
y <- expr(bill_length_mm)x1 <- expr(body_mass_g)x2 <- expr(species)lm3(!!y ~ !!x1 + !!x2, penguins)
lm(bill_length_mm ~ body_mass_g + species, data = penguins)
Call:lm(formula = bill_length_mm ~ body_mass_g + species, data = penguins)Coefficients: (Intercept) body_mass_g speciesChinstrap speciesGentoo 24.919471 0.003748 9.920884 3.557978
Problem: What if you want a function that resamples before training the model?
Something that doesn't work:
resample_lm0 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) resample_data <- resample(data, n = nrow(data)) lm_call <- expr(lm(!!formula, data = resample_data)) expr_print(lm_call) eval(lm_call, env)}df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))resample_lm0(y ~ x, data = df)
lm(y ~ x, data = resample_data)
Error in is.data.frame(data): object 'resample_data' not found
lm_call
and resample_data
are in different environments.
Unquote the data frame into the call:
df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))resample_lm1 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) resample_data <- resample(data, n = nrow(data)) lm_call <- expr(lm(!!formula, data = !!resample_data)) expr_print(lm_call) eval(lm_call, env)}resample_lm1(y ~ x, data = df)$call
lm(y ~ x, data = <df[,2]>)
lm(formula = y ~ x, data = list(x = c(8L, 4L, 5L, 3L, 9L, 7L, 8L, 6L, 1L, 10L), y = c(28.19, 16.37, 18.49, 13.93, 30.69, 25.79, 28.19, 23.18, 7.81, 36.09)))
A cleaner approach:
create a new environment that inherits from the caller
bind variables that you’ve created inside the function to that environment.
resample_lm2 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) resample_data <- resample(data, n = nrow(data)) lm_env <- env(env, resample_data = resample_data) lm_call <- expr(lm(!!formula, data = resample_data)) expr_print(lm_call) eval(lm_call, lm_env)}resample_lm2(y ~ x, data = df)
lm(y ~ x, data = resample_data)
Call:lm(formula = y ~ x, data = resample_data)Coefficients:(Intercept) x 4.554 2.922
There are many ways to do non-standard evaluation
Tidy evaluation is a good framework for applying NSE
Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.
The fact that we are customising the environments means all the forms of evaluation we go over are non-standard
Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation
To do Tidy Evaluation, we make use of functions in the rlang
package
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Evaluation: evaluating quoted expressions in custom environments to achieve specific goals.
The fact that we are customising the environments means all the forms of evaluation we go over are non-standard
Particular emphasis is placed on one type of Non-standard Evaluation (NSE), Tidy Evaluation
To do Tidy Evaluation, we make use of functions in the rlang
package
base::eval()
eval
: evaluates an input expression in an input environment.
Arguments to eval
:
expr: the expression you want to evaluate
envir: the environment you want to evaluate it in
x <- 10eval(rlang::expr(x))
[1] 10
y <- 2eval(expr(x + y), env(x = 1000))
[1] 1002
The first argument of eval
is evaluated not quoted.
eval(print(x + 1), env(x = 1000))
[1] 11
[1] 11
eval(expr(print(x + 1)), env(x = 1000))
[1] 1001
local()
local()
: a base R function that allows you to carry out a series of steps in new environment.
not_foo <- local({ x <- 10 y <- 200 x + y})
local
using eval
We can replicate local
:
local2 <- function(expr) { env <- env(caller_env()) eval(enexpr(expr), env)}foo <- local2({ x <- 10 y <- 200 x + y})foo
[1] 210
source
Similarly, we can replicate source
:
source2 <- function(path, env = caller_env()) { file <- paste(readLines(path, warn = FALSE), collapse = "\n") exprs <- parse_exprs(file) res <- NULL for (i in seq_along(exprs)) { res <- eval(exprs[[i]], env) } invisible(res)}
source3 <- function(file, env = parent.frame()) { lines <- parse(file) # creates an expression vector res <- eval(lines, envir = env) invisible(res)}
Encapsulation of:
expression
environment
Coupling is so important that rlang
provides a composite structure
rlang
provides 3 ways:
rlang
provides 3 ways:
enquo()
and enquos()
rlang
provides 3 ways:
enquo()
and enquos()
rlang::quo()
and rlang::quos()
expr()
and exprs()
. rlang
provides 3 ways:
enquo()
and enquos()
rlang::quo()
and rlang::quos()
expr()
and exprs()
. new_quosure()
quosure_create <- function(x) enquo(x)quosure_create(a + b)
<quosure>expr: ^a + benv: global
quosures_create <- function(x)enquos(x)quosures_create(list(x = x ^ 2, y = y ^ 3, z = z ^ 4))
<list_of<quosure>>[[1]]<quosure>expr: ^list(x = x^2, y = y^3, z = z^4)env: global
quo
and new_quosure
quo
quo(x + y + z)
<quosure>expr: ^x + y + zenv: global
new_quosure
new_quosure(expr(x + y), env(x = 1, y = 10))
<quosure>expr: ^x + yenv: 0x7fc82bc14ac8
super_quosure <- new_quosure(expr(x + y + z))class(super_quosure) # are subclasses of formulas
[1] "quosure" "formula"
is_call(super_quosure) # are call objects
[1] TRUE
attr(super_quosure, ".Environment") #have a .Environment attribute
<environment: R_GlobalEnv>
get_expr(super_quosure) # an expression can be extracted
x + y + z
get_env(super_quosure) # an environment can be extracted
<environment: R_GlobalEnv>
A form of NSE utilizing 3 main features:
quasiquotation
data masks
eval_tidy()
: the function that does the workeval_tidy
takes two arguments:
a quosure
a data mask (data frame): first place to look for variable definitions
Example: using eval_tidy
to find the largest penguin (mass) in palmerpenguins::penguins
:
library(palmerpenguins)penguin_quosure <- quosure_create(max(body_mass_g, na.rm = TRUE))# Now use the penguins data frame as data maskeval_tidy(penguin_quosure, penguins)
[1] 6300
with
How with
works:
library(palmerpenguins)with(penguins, mean(body_mass_g, na.rm = TRUE))
[1] 4201.754
A new version of with
:
with2 <- function(data, expr) { expr <- enquo(expr) eval_tidy(expr, data)}
with2(penguins, mean(body_mass_g, na.rm = TRUE))
[1] 4201.754
subset
subset2 <- function(data, rows) { rows <- enquo(rows) rows_val <- eval_tidy(rows, data) stopifnot(is.logical(rows_val)) data[rows_val, , drop = FALSE]}df <- subset2(penguins, species == "Adelie")table(df$species)
Adelie Chinstrap Gentoo 152 0 0
The data mask provides two pronouns: .data and .env.
x <- 1df <- data.frame(x = 2)
with2(df, .data$x)
[1] 2
with2(df, .env$x)
[1] 1
There's no reason that should work. But it does and can be used to avoid ambiguity.
.data
and .env
are actually exported from rlang
.data
retrieves data-variables from the data frame
.env
retrieves env-variables from the enviroment
They are not real data frames: they just act like them sometimes
names(.data)
or map
over it. A practical example:
resample <- function(df, n) { idx <- sample(nrow(df), n, replace = TRUE) df[idx, , drop = FALSE]}
You want to create a new function that resamples and subsamples in a single step
An approach that does not work:
subsample <- function(df, cond, n = nrow(df)) { df <- subset2(df, cond) resample(df, n)}df <- data.frame(x = c(1, 1, 1, 2, 2), y = 1:5)subsample(df, x == 1)
x y2 1 23 1 31 1 13.1 1 35 2 5
subsample()
doesn’t quote any arguments so cond is evaluated normally (not in a data mask), and we get an error when it tries to find a binding for x.
An approach that does work
subsample <- function(df, cond, n = nrow(df)) { cond <- enquo(cond) df <- subset2(df, !!cond) resample(df, n)}subsample(df, x == 1)
x y1 1 12 1 21.1 1 1
Consider the function that is meant to find all the rows of df where x is at least some threshold value:
threshold_x <- function(df, val) { subset2(df, x >= val)}
How can this go wrong?
if val
is in df
if x
is in the calling environment but not in df
Here's a better implementation:
threshold_x <- function(df, val) { subset2(df, .data$x >= .env$val)}
Case 1: x
is in the calling environment but not in df
no_x <- data.frame(y = 1:3)x <- 10threshold_x(no_x, 2)
Error: Column `x` not found in `.data`
Case 2: If val
is in df
has_val <- data.frame(x = 1:3, val = 9:11)threshold_x(has_val, 2)
x val2 2 103 3 11
Two common patterns for NSE in base R:
substitute()
and evaluation in the caller environment using eval()
match.call()
, call manipulation, and evaluation in the caller environment
substitute
substitute
returns the parse tree for the (unevaluated) expression expr, substituting any variables bound in env.
eval
: evaluates an R expression. Its arguments are:
expr: an object to be evaluated.
envir: the environment in which expr is to be evaluated. May also be NULL, a list, a data frame, a pairlist or an integer as specified to sys.call.
enclos: Relevant when envir is a (pair)list or a data frame. Specifies the enclosure, i.e., where R looks for objects not found in envir.
How subset
is used:
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 1, 4, 1))subset(sample_df, a >= 4)
a b c4 4 2 45 5 1 1
How subset
is implemented in base
subset_base <- function(data, rows) { rows <- substitute(rows) rows_val <- eval(rows, data, caller_env()) stopifnot(is.logical(rows_val)) data[rows_val, , drop = FALSE]}
base::subset
always evaluates rows in the calling environment, but if ... has been used, then the expression might need to be evaluated elsewhere.
map()
or lapply()
Calling subset()
from another function requires some care: you have to use substitute() to capture a call to subset() complete expression, and then evaluate
eval()
doesn’t provide any pronouns so there’s no way to require part of the expression to come from the data.
subset
: Using tidy evaluationsubset_tidy <- function(data, rows) { rows <- enquo(rows) rows_val <- eval_tidy(rows, data) stopifnot(is.logical(rows_val)) data[rows_val, , drop = FALSE]}
match.call
: Backgroundmatch.call
: returns a call in which all of the specified arguments are specified by their full names.
my_func <- function(input1, input2){ match.call()}
my_func(input1 = 1)
my_func(input1 = 1)
my_func(input2 = 1, 2)
my_func(input1 = 2, input2 = 1)
my_func(input3 = 2)
Error in my_func(input3 = 2): unused argument (input3 = 2)
Steps in using match.call
to do NSE:
Capture the complete call
Modify it
Evaluate the results
write.csv
write.csv <- function(...) { call <- match.call(write.table, expand.dots = TRUE) call[[1]] <- quote(write.table) call$sep <- "," call$dec <- "." eval(call, parent.frame())}
write.csv
It could have been done like this:
write.csv <- function(...) { write.table(..., sep = ",", dec = ".")}
Simplest possible wrapper:
lm2 <- function(formula, data) { lm(formula, data)}
lm2(bill_length_mm ~ body_mass_g, penguins)
Call:lm(formula = formula, data = data)Coefficients:(Intercept) body_mass_g 26.898872 0.004051
lm3 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) data <- enexpr(data) lm_call <- expr(lm(!!formula, data = !!data)) expr_print(lm_call) eval(lm_call, env)}lm3(bill_length_mm ~ body_mass_g, penguins)
lm(bill_length_mm ~ body_mass_g, data = penguins)
Call:lm(formula = bill_length_mm ~ body_mass_g, data = penguins)Coefficients:(Intercept) body_mass_g 26.898872 0.004051
There are 3 key steps:
capture the unevaluated arguments using enexpr()
, and capture the caller environment using caller_env()
.
generate a new expression using expr()
and unquoting.
evaluate that expression in the caller environment.
Nice side-effect: Unquoting can be used to generate formulas
y <- expr(bill_length_mm)x1 <- expr(body_mass_g)x2 <- expr(species)lm3(!!y ~ !!x1 + !!x2, penguins)
lm(bill_length_mm ~ body_mass_g + species, data = penguins)
Call:lm(formula = bill_length_mm ~ body_mass_g + species, data = penguins)Coefficients: (Intercept) body_mass_g speciesChinstrap speciesGentoo 24.919471 0.003748 9.920884 3.557978
Problem: What if you want a function that resamples before training the model?
Something that doesn't work:
resample_lm0 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) resample_data <- resample(data, n = nrow(data)) lm_call <- expr(lm(!!formula, data = resample_data)) expr_print(lm_call) eval(lm_call, env)}df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))resample_lm0(y ~ x, data = df)
lm(y ~ x, data = resample_data)
Error in is.data.frame(data): object 'resample_data' not found
lm_call
and resample_data
are in different environments.
Unquote the data frame into the call:
df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))resample_lm1 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) resample_data <- resample(data, n = nrow(data)) lm_call <- expr(lm(!!formula, data = !!resample_data)) expr_print(lm_call) eval(lm_call, env)}resample_lm1(y ~ x, data = df)$call
lm(y ~ x, data = <df[,2]>)
lm(formula = y ~ x, data = list(x = c(8L, 4L, 5L, 3L, 9L, 7L, 8L, 6L, 1L, 10L), y = c(28.19, 16.37, 18.49, 13.93, 30.69, 25.79, 28.19, 23.18, 7.81, 36.09)))
A cleaner approach:
create a new environment that inherits from the caller
bind variables that you’ve created inside the function to that environment.
resample_lm2 <- function(formula, data, env = caller_env()) { formula <- enexpr(formula) resample_data <- resample(data, n = nrow(data)) lm_env <- env(env, resample_data = resample_data) lm_call <- expr(lm(!!formula, data = resample_data)) expr_print(lm_call) eval(lm_call, lm_env)}resample_lm2(y ~ x, data = df)
lm(y ~ x, data = resample_data)
Call:lm(formula = y ~ x, data = resample_data)Coefficients:(Intercept) x 4.554 2.922
There are many ways to do non-standard evaluation
Tidy evaluation is a good framework for applying NSE