Evaluation

Learning objectives:

  • Learn evaluation basics
  • Learn about quosures and data mask
  • Understand tidy evaluation
library(rlang)
library(purrr)

A bit of a recap

  • Metaprogramming: To separate our description of the action from the action itself - Separate the code from its evaluation.
  • Quasiquotation: combine code written by the function’s author with code written by the function’s user.
    • Unquotation: it gives the user the ability to evaluate parts of a quoted argument.
    • Evaluation: it gives the developer the ability to evluated quoted expression in custom environments.

Tidy evaluation: quasiquotation, quosures and data masks

Evaluation basics

We use eval() to evaluate, run, or execute expressions. It requires two arguments:

  • expr: the object to evaluate, either an expression or a symbol.
  • env: the environment in which to evaluate the expression or where to look for the values. Defaults to current env.
sumexpr <- expr(x + y)
x <- 10
y <- 40
eval(sumexpr)
#> [1] 50
eval(sumexpr, envir = env(x = 1000, y = 10))
#> [1] 1010

Application: reimplementing source()

What do we need?

  • Read the file being sourced.
  • Parse its expressions (quote them?)
  • Evaluate each expression saving the results
  • Return the results
source2 <- function(path, env = caller_env()) {
  file <- paste(readLines(path, warn = FALSE), collapse = "\n")
  exprs <- parse_exprs(file)

  res <- NULL
  for (i in seq_along(exprs)) {
    res <- eval(exprs[[i]], env)
  }

  invisible(res)
}

The real source is much more complex.

Quosures

quosures are a data structure from rlang containing both and expression and an environment

Quoting + closure because it quotes the expression and encloses the environment.

Three ways to create them:

  • Used mostly for learning: new_quosure(), creates a quosure from its components.
q1 <- rlang::new_quosure(expr(x + y), 
                         env(x = 1, y = 10))

With a quosure, we can use eval_tidy() directly.

rlang::eval_tidy(q1)
#> [1] 11

And get its components

rlang::get_expr(q1)
#> x + y
rlang::get_env(q1)
#> <environment: 0x000001f161f6d698>

Or set them

q1 <- set_env(q1, env(x = 3, y = 4))
eval_tidy(q1)
#> [1] 7
  • Used in the real world: enquo() o enquos(), to capture user supplied expressions. They take the environment from where they’re created.
foo <- function(x) enquo(x)
quo_foo <- foo(a + b)
get_expr(quo_foo)
#> a + b
get_env(quo_foo)
#> <environment: R_GlobalEnv>
  • Almost never used: quo() and quos(), to match to expr() and exprs().

Quosures and ...

Quosures are just a convenience, but they are essential when it comes to working with ..., because you can have each argument from ... associated with a different environment.

g <- function(...) {
  ## Creating our quosures from ...
  enquos(...)
}

createQuos <- function(...) {
  ## symbol from the function environment
  x <- 1
  g(..., f = x)
}
## symbol from the global environment
x <- 0
qs <- createQuos(global = x)
qs
#> <list_of<quosure>>
#> 
#> $global
#> <quosure>
#> expr: ^x
#> env:  global
#> 
#> $f
#> <quosure>
#> expr: ^x
#> env:  0x000001f15fc6f2e0

Other facts about quosures

Formulas were the inspiration for closures because they also capture an expression and an environment

f <- ~runif(3)
str(f)
#> Class 'formula'  language ~runif(3)
#>   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

There was an early version of tidy evaluation with formulas, but there’s no easy way to implement quasiquotation with them.

They are actually call objects

q4 <- new_quosure(expr(x + y + z))
class(q4)
#> [1] "quosure" "formula"
is.call(q4)
#> [1] TRUE

with an attribute to store the environment

attr(q4, ".Environment")
#> <environment: R_GlobalEnv>

Nested quosures

With quosiquotation we can embed quosures in expressions.

q2 <- new_quosure(expr(x), env(x = 1))
q3 <- new_quosure(expr(x), env(x = 100))

nq <- expr(!!q2 + !!q3)

And evaluate them

eval_tidy(nq)
#> [1] 101

But for printing it’s better to use expr_print(x)

expr_print(nq)
#> (^x) + (^x)
nq
#> (~x) + ~x

Data mask

A data frame where the evaluated code will look first for its variable definitions.

Used in packages like dplyr and ggplot.

To use it we need to supply the data mask as a second argument to eval_tidy()

q1 <- new_quosure(expr(x * y), env(x = 100))
df <- data.frame(y = 1:10)

eval_tidy(q1, df)
#>  [1]  100  200  300  400  500  600  700  800  900 1000

Everything together, in one function.

with2 <- function(data, expr) {
  expr <- enquo(expr)
  eval_tidy(expr, data)
}

But we need to create the objects that are not part of our data mask

x <- 100
with2(df, x * y)
#>  [1]  100  200  300  400  500  600  700  800  900 1000

Also doable with base::eval() instead of rlang::eval_tidy() but we have to use base::substitute() instead of enquo() (like we did for enexpr()) and we need to specify the environment.

with3 <- function(data, expr) {
  expr <- substitute(expr)
  eval(expr, data, caller_env())
}
with3(df, x*y)
#>  [1]  100  200  300  400  500  600  700  800  900 1000

Pronouns: .data$ and .env$

Ambiguity!!

An object value can come from the env or from the data mask

q1 <- new_quosure(expr(x * y + x), env = env(x = 1))
df <- data.frame(y = 1:5, 
                 x = 10)

eval_tidy(q1, df)
#> [1] 20 30 40 50 60

We use pronouns:

  • .data$x: x from the data mask
  • .env$x: x from the environment
q1 <- new_quosure(expr(.data$x * y + .env$x), env = env(x = 1))
eval_tidy(q1, df)
#> [1] 11 21 31 41 51

Application: reimplementing base::subset()

base::subset() works like dplyr::filter(): it selects rows of a data frame given an expression.

What do we need?

  • Quote the expression to filter
  • Figure out which rows in the data frame pass the filter
  • Subset the data frame
subset2 <- function(data, rows) {
  rows <- enquo(rows)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))

  data[rows_val, , drop = FALSE]
}
sample_df <- data.frame(a = 1:5, b = 5:1, c = c(5, 3, 2, 4, 1))

# Shorthand for sample_df[sample_df$b == sample_df$c, ]
subset2(sample_df, b == c)
#>   a b c
#> 1 1 5 5
#> 5 5 1 1

Using tidy evaluation

Most of the time we might not call it directly, but call a function that uses eval_tidy() (becoming developer AND user)

Use case: resample and subset

We have a function that resamples a dataset:

resample <- function(df, n) {
  idx <- sample(nrow(df), n, replace = TRUE)
  df[idx, , drop = FALSE]
}
resample(sample_df, 10)
#>     a b c
#> 4   4 2 4
#> 3   3 3 2
#> 1   1 5 5
#> 1.1 1 5 5
#> 3.1 3 3 2
#> 5   5 1 1
#> 5.1 5 1 1
#> 3.2 3 3 2
#> 5.2 5 1 1
#> 4.1 4 2 4

But we also want to use subset and we want to create a function that allow us to resample and subset (with subset2()) in a single step.

First attempt:

subsample <- function(df, cond, n = nrow(df)) {
  df <- subset2(df, cond)
  resample(df, n)
}
subsample(sample_df, b == c, 10)
#> Error: object 'b' not found

What happened?

subsample() doesn’t quote any arguments and cond is evaluated normally

So we have to quote cond and unquote it when we pass it to subset2()

subsample <- function(df, cond, n = nrow(df)) {
  cond <- enquo(cond)

  df <- subset2(df, !!cond)
  resample(df, n)
}
subsample(sample_df, b == c, 10)
#>     a b c
#> 5   5 1 1
#> 5.1 5 1 1
#> 5.2 5 1 1
#> 1   1 5 5
#> 5.3 5 1 1
#> 1.1 1 5 5
#> 5.4 5 1 1
#> 5.5 5 1 1
#> 1.2 1 5 5
#> 1.3 1 5 5

Be careful!, potential ambiguity:

threshold_x <- function(df, val) {
  subset2(df, x >= val)
}

What would happen if x exists in the calling environment but doesn’t exist in df? Or if val also exists in df?

So, as developers of threshold_x() and users of subset2(), we have to add some pronouns:

threshold_x <- function(df, val) {
  subset2(df, .data$x >= .env$val)
}

Just remember:

As a general rule of thumb, as a function author it’s your responsibility to avoid ambiguity with any expressions that you create; it’s the user’s responsibility to avoid ambiguity in expressions that they create.

Base evaluation

Check 20.6 in the book!