Chapter 20 Evaluation

20.1 Introduction

There’s an immediate distinction between unquotation (user) and evaluation (developer). What are they?

Unquoting seems to be “evaluate this one part and then return the expression” evaluate means “give me the result of this whole expression”

20.2.1 `local`

This is such a cool function - where is it used in the wild?

When wanting to have variables stored in the environment of a function, or if you want to run an expression that produces bindings, but you don’t want those bindings to persist.

counter <- local({ 
  i <- 0; function() { 
    i <<- i + 1; print(i)
  }
})

counter()
counter()
counter()

[1] 1
[1] 2
[1] 3

We can also wrap for loops in a local call to avoid the object being iterated over getting assigned to the global environment

f1 <- function(x) {
  for (x in 1:10) {
    # do something
  }
  x
}

f2 <- function(x) {
  local({
    for (x in 1:10) {
      # do something
    }
  })
  x
}

f1("hello")
#> [1] 10
f2("hello")
#> [1] "hello"

20.2.2 `source`

Expression vectors get a shout out again (they are an aside in the Expression chapter) and while Hadley advises against introducing another data structure, I wanted to make sure my high level understanding was correct: eval can be vectorized if the input is of type expression vector [but not if it’s of type list]?

exp_vec <- expression(x <- 4, x)
eval(exp_vec)

exp_list <- list2(
  rlang::expr(x <- 4),
  rlang::expr(x)
)
eval(exp_list)

20.2.3 `function`

I think I understand this issue based on the concrete example but can we summarize the “gotcha” here?

It’s basically that the printing of the function is a lie (because base R doesn’t get it). And that’s dangerous and confusing, so use rlang to make it not a lie.

x <- 10
y <- 20
f <- eval(expr(function(x, y) !!x + !!y))
f
f()
attributes(f)

attr(f, "srcref") <- NULL
f

The REAL function is 10 + 20, but the initial srcref keeps the !!x and !!y, which is meaningless. This might help explain why there’s a problem:

x <- 10
y <- 20
f <- eval(expr(function(x, y) !!x + !!y))
f
f()

x <- 2000
y <- 3000
f()
#> [1] 30
attributes(f)
#> $srcref
#> function(x, y) !!x + !!y
attr(f, "srcref") <- NULL
f
#> function (x, y) 
#> 10 + 20
#> <bytecode: 0x000000001421eb90>

20.2.4.5 Exercises

Can we also go over what is happening in local3?

local3 <- function(expr, envir = new.env()) {
  call <- substitute(eval(quote(expr), envir))
  print(call)
  eval(call, envir = parent.frame())
}

local3({
  x <- 10
  x * 2
})

exists("x")

It creates a new environment with the calling environment as its parent and then evaluates the expression in that environment.

You are evaluating the call in the execution environment of local3. Its confusing because it’s written as eval(call, envir = parent.frame()) but it is important to remember that parent.frame is evaluated in the execution environment of eval.

if instead you had:

pf <- parent.frame()
eval(call, envir = pf)

you would get something else. Because in this case, parent.frame() is evaluated in the execution environment of local3 and thus returns .GlobalEnv (presumably you run it from the global env).

20.3.3 Dots

I have no idea what’s happening in this section. Can we go over the point/what this code is showing us?

f <- function(...) {
  x <- 1
  g(..., f = x)
}

g <- function(...) {
  enquos(...)
}

x <- 0
qs <- f(global = x)
qs

As a preliminary, I note that the following code also illustrates the point:

f <- function(...) {
  x <- 1
  g(..., f = x)
}
g <- function(...) {
  enquos(...)
}
x <- 0
qs <- f(x)
qs

I think what he’s trying to say is that if you didn’t have quosures you would have to create a list of environments to match the list of expressions if you wanted your expressions to contain the same names but not necessarily have those names have the same meaning.

But here we have a magical situation where each quosure has the same expression but each environment is different and thus each x will be evaluated differently when the time comes.

I think the real barrier to truly getting it is figuring out a plausible example of wanting the same name to have different meanings in your list of expressions.. and I’m at a loss for that at the moment.

20.3.4 Under the hood

Unfortunately, however, there is no clean way to make ~ a quasiquoting function.

Can we come up with a (broken) example of trying to quasiquote an object of type formula? I think seeing this fail will help me to understand its limitations

To do this you would need to recreate the way rlang worked when quosures WERE formulas, probably by pulling the commit prior to the rewrite off of github. It looks like the old method would encode the expr as a formula and then overload the tilde operator so that it does tidy eval instead of..whatever it does normally

20.3.6.2 Exercises

What is going on here??

enenv <- function(x){
  get_env(enquo(x))
}

# Test
capture_env <- function(x){
  enenv(x)
}

enenv(x)
#> <environment: R_GlobalEnv>
capture_env(x)  # functions execution environment is captured
#> <environment: 0x60a2538>

The enquo is capturing the environment of the x being passed into enenv, so it returns a different environment (the one inside capture_env) than running enenv(x) on its own

20.4.1 Basics

I am clearly missing something here. What is so special about multiplying across a vector? Is it that data masks allow us to just write y instead of df$y?

I think the simplicity of the multiplication may be what he’s going for: the reader can focus on the language feature provided without the distraction of a complex computation.

20.4.3 `subset`

What is the eval_tidy doing here?

subset2 <- function(data, rows) {
  rows <- enquo(rows)
  # creates a quosure out of the enquoted rows
  # and uses the data as its environment
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))

  # then subsets the data
  data[rows_val, , drop = FALSE]
}

df <- subset2(palmerpenguins::penguins, species == "Adelie")
table(df$species)

20.4.5 `transform`

Can we say in words what this one is doing or comment all the lines?

transform2 <- function(.data, ...) {
  
  # create quosures for all the supplied arguments and values
  dots <- enquos(...)

  # for all argument value pairs
  for (i in seq_along(dots)) {
    # get the names of the arguments
    name <- names(dots)[[i]]
    # get their values
    dot <- dots[[i]]

    # add column with names equal to argument names
    # eval_tidy and values (single column) equal to the evaluated quosure (dot)"
    .data[[name]] <- eval_tidy(dot, .data)
  }

  .data
}

20.5.2 Handling ambiguity

There are subtle differences in when val is evaluated. If you unquote, val will be early evaluated by enquo(); if you use a pronoun, val will be lazily evaluated by eval_tidy(). These differences are usually unimportant, so pick the form that looks most natural.

Is there a case where this subtle distinction between .env$val vs !!val matters?

subset2 <- function(data, rows) {
  rows <- enquo(rows)
  # change val from 2 to 3, breaking things
  env_bind(caller_env(), val = 3)
  rows_val <- eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

threshold_prefix <- function(df, val) {
  subset2(df, .data$x >= .env$val)
}

threshold_bangbang <- function(df, val) {
  subset2(df, .data$x >= !!val)
}

df <- data.frame(x = 1:3, val = 9:11)
threshold_prefix(df, 2) == threshold_bangbang(df, 2)

In the !! case, val is evaluated early, (inside of threshold_x) whereas the .env case evaluated later (in the eval_tidy). This could cause problems if, for example, the val in threshold_x was altered after subset2 was called, but before the eval_tidy

Why do we need to {{ cond }} in the following part of the chapter?

subset2 <- function(data, rows) {
  rows <- rlang::enquo(rows)
  rlang::env_bind(rlang::caller_env(), val = 3)
  rows_val <- rlang::eval_tidy(rows, data)
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

resample <- function(df, n) {
  idx <- sample(nrow(df), n, replace = TRUE)
  df[idx, , drop = FALSE]
}

subsample <- function(df, cond, n = nrow(df)) {
  df <- subset2(df, {{cond}})
  resample(df, n)
}

df <- data.frame(x = c(1,1,1,2,2), y = 1:5)
subsample(df, x == 1)

When it appears unquoted in the call to subset2, it loses its… history, basically. The rlang stuff protects it so R basically doesn’t know it exists until it needs to. It keeps its evaluation as lazy as is needed. Because of laziness, {{ cond }} gets passed to subset2, which ~says “Ok, but what was cond when it was passed to you?”, and subsample ~says “Huh, I dunno, I’ll ask my calling environment,” and then x == 1 makes it into subset2 without being evaluated in the subsample environment. Basically.

20.6.1 substitute()

Why exactly can’t we use subset with map?

This works, not sure why the book says it wouldn’t….

subset_base <- function(data, rows) {
  rows <- substitute(rows)
  rows_val <- eval(rows, data, caller_env())
  stopifnot(is.logical(rows_val))
  data[rows_val, , drop = FALSE]
}

local({
  zzz <- 2
  dfs <- list(data.frame(x = 1:3), data.frame(x = 4:6))
  map(dfs, ~subset_base(.x, x == zzz))
})

Calling subset() from another function requires some care: you have to use substitute() to capture a call to subset() complete expression

Why?!

No reason to expect this to work:

test <- function(df, expr) {
  subset(df, expr)
}

test(mtcars, cyl > 4)

#> Error in eval(e, x, parent.frame()): object 'cyl' not found

substitute only looks at the expression from which it was called. this doesn’t work because subset sees substitute(expr), not cyl > 4

test <- function(df, expr) {
  subset(df, substitute(expr))
}
test(mtcars, cyl > 4)

#> Error in subset.data.frame(df, substitute(expr)): 'subset' must be logical

so instead, we are going to build the call to subset, substituting in what was passed as arguments, and then eval the call.

test <- function(df, expr) {
  subset_call <- substitute(subset(df, expr))
  print(subset_call)
  eval(subset_call)
}
test(mtcars, cyl > 4)

#> subset(mtcars, cyl > 4)

20.6.2 `match.call`

Can we speak more about the differences in these two functions and their pros and cons?

resample_lm2 <- function(formula, data, env = caller_env()) {
  formula <- enexpr(formula)
  resample_data <- resample(data, n = nrow(data))

  lm_env <- env(env, resample_data = resample_data)
  lm_call <- expr(lm(!!formula, data = resample_data))
  expr_print(lm_call)
  eval(lm_call, lm_env)
}
resample_lm2(y ~ x, data = df)

resample_lm0 <- function(
  formula, data,
  resample_data = data[sample(nrow(data), replace = TRUE), ,
                       drop = FALSE],
  env = current_env()
) {
  formula <- enexpr(formula)

  lm_call <- expr(lm(!!formula, data = resample_data))
  expr_print(lm_call)
  eval(lm_call, env)
}

df <- data.frame(x = 1:10, y = 5 + 3 * (1:10) + round(rnorm(10), 2))
(resamp_lm1 <- resample_lm0(y ~ x, data = df))

I think these two functions should work identically assuming formula and data are the only two inputs that are set by the user. (The results will differ unless a seed is set.)

I suppose the resample_lm0() here provides more flexibility to the user (compared to resample_lm2(), i.e. the user can specify resample_data (since it’s an argument) instead of being “forced” to use the custom function resample(), which is embedded in the body of resample_lm2().

Downside to resample_lm0 is that there might be too much flexibility. Yes, resample_data is customizable, but so is env. What if the user changes env = current_env() to env = caller_env()? Then the function breaks.

Is the closest equivalent to deparse(substitute(x)) with rlang expr_text(enexpr(x)) (assuming this is in a function, hence the en in enexpr)?

penguins <- palmerpenguins::penguins
add_bleh <- function(df, nm = deparse(substitute(df))) {
  col_out <- sym(sprintf('new_%s_col', nm))
  df %>% 
    mutate(!!col_out := 'bleh')
}
add_bleh(penguins) %>% slice(1) %>% glimpse()

## Rows: 1
## Columns: 9
## $ species           <fct> Adelie
## $ island            <fct> Torgersen
## $ bill_length_mm    <dbl> 39.1
## $ bill_depth_mm     <dbl> 18.7
## $ flipper_length_mm <int> 181
## $ body_mass_g       <int> 3750
## $ sex               <fct> male
## $ year              <int> 2007
## $ new_penguins_col  <chr> "bleh"

In this thread Lionel says "enexpr should almost never be used. So when should it?

Seems like when wrapping a base NSE function like lm or trying to deparse(substitute(x)) using rlang

How do you capture the unquoting operator !! without evaluating it in the rlang framework:

reflect_lang <- function(lang) {
  r <- rlang::enexpr(lang)
  r
}

reflect_lang(x+y)

x + y

reflect_lang(!!x)

# Error in rlang::enexpr(lang): object 'x' not found

reflect_lang(quote(!!x))

# Error in rlang::enexpr(lang): object 'x' not found

Desired output: !!x

reflect_rlang <- function(expr) {
  substitute(expr)
}
reflect_rlang(!!x)

!!x