Quasiquotation

Learning objectives:

What quasiquotation means
Why it’s important
Learn some practical uses

library(rlang)
library(purrr)

Introduction

Three pillars of tidy evaluation

Quasiquotation
Quosures (chapter 20)
Data masks (Chapter 20)

Quasiquotation = quotation + unquotation

Quote. Capture unevaluated expression… (“defuse”)
Unquote. Evaluate selections of quoted expression! (“inject”)
Functions that use these features are said to use Non-standard evaluation (NSE)
Note: related to Lisp macros, and also exists in other languages with Lisp heritage, e.g. Julia

On it’s own, Quasiquotation good for programming, but combined with other tools, important for data analysis.

Motivation

Simple concrete example:

cement() is a function that works like paste() but doesn’t need need quotes

(Think of automatically adding ‘quotes’ to the arguments)

cement <- function(...) {
  args <- ensyms(...)
  paste(purrr::map(args, as_string), collapse = " ")
}

cement(Good, morning, Hadley)

#> [1] "Good morning Hadley"

What if we wanted to use variables? What is an object and what should be quoted?

This is where ‘unquoting’ comes in!

name <- "Bob"
cement(Good, afternoon, !!name) # Bang-bang!

#> [1] "Good afternoon Bob"

Vocabulary

Can think of cement() and paste() as being ‘mirror-images’ of each other.

paste() - define what to quote - Evaluates arguments
cement() - define what to unquote - Quotes arguments

Quoting function similar to, but more precise than, Non-standard evaluation (NSE)

Tidyverse functions - e.g., dplyr::mutate(), tidyr::pivot_longer()
Base functions - e.g., library(), subset(), with()

Quoting function arguments cannot be evaluated outside of function:

cement(Good, afternoon, Cohort) # No problem

#> [1] "Good afternoon Cohort"

Good      # Error!

#> Error: object 'Good' not found

Non-quoting (standard) function arguments can be evaluated:

paste("Good", "afternoon", "Cohort")

#> [1] "Good afternoon Cohort"

"Good"

#> [1] "Good"

Quoting

Capture expressions without evaluating them

#> Warning in body[[col]][rows][!is.na(result)] <- omit_na(result): number of
#> items to replace is not a multiple of replacement length

	Developer	User
Expression (Quasiquotation)
One	`expr()`	`enexpr()`
Many	`exprs()`	`enexprs()`
Symbol (Quasiquotation)
One	`expr()`	`ensym()`
Many	`exprs()`	`ensyms()`
R Base (Quotation)
One	`quote()`	`alist()`
Many	`substitute()`	`as.list(substitute(...()))`

Non-base functions are from rlang
Developer - From you, direct, fixed, interactive
User - From the user, indirect, varying, programmatic

Also:

bquote() provides a limited form of quasiquotation
~, the formula, is a quoting function (see Section 20.3.4)

`expr()` and `exprs()`

expr(x + y)

#> x + y

exprs(exp1 = x + y, exp2 = x * y)

#> $exp1
#> x + y
#> 
#> $exp2
#> x * y

`enexpr()`¹ and `enexprs()`

f <- function(x) enexpr(x)
f(a + b + c)

#> a + b + c

f2 <- function(x, y) enexprs(exp1 = x, exp2 = y)
f2(x = a + b, y = c + d)

#> $exp1
#> a + b
#> 
#> $exp2
#> c + d

`ensym()` and `ensyms()`

Remember: Symbol represents the name of an object. Can only be length 1.
These are stricter than enexpr/s()

f <- function(x) ensym(x)
f(a)

#> a

f2 <- function(x, y) ensyms(sym1 = x, sym2 = y)
f2(x = a, y = "b")

#> $sym1
#> a
#> 
#> $sym2
#> b

Unquoting

Selectively evaluate parts of an expression

Merges ASTs with template
1 argument !! (unquote, bang-bang)
- Unquoting a function call evaluates and returns results
- Unquoting a function (name) replaces the function (alternatively use call2())
>1 arguments !!! (unquote-splice, bang-bang-bang, triple bang)
!! and !!! only work like this inside quoting function using rlang

Basic unquoting

One argument

x <- expr(a + b)
y <- expr(c / d)

expr(f(x, y))      # No unquoting
#> f(x, y)
expr(f(!!x, !!y))  # Unquoting
#> f(a + b, c/d)

Multiple arguments

z <- exprs(a + b, c + d)
w <- exprs(exp1 = a + b, exp2 = c + d)

expr(f(z))      # No unquoting
#> f(z)
expr(f(!!!z))   # Unquoting
#> f(a + b, c + d)
expr(f(!!!w))   # Unquoting when named
#> f(exp1 = a + b, exp2 = c + d)

Special usages or cases

For example, get the AST of an expression

lobstr::ast(x)
#> x
lobstr::ast(!!x)
#> █─`+` 
#> ├─a 
#> └─b

Unquote function call

expr(f(!!mean(c(100, 200, 300)), y))
#> f(200, y)

Unquote function

f <- expr(sd)
expr((!!f)(x))
#> sd(x)
expr((!!f)(!!x + !!y))
#> sd(a + b + c/d)

Non-quoting

Only bquote() provides a limited form of quasiquotation.

The rest of base selectively uses or does not use quoting (rather than unquoting).

Four basic forms of quoting/non-quoting:

Pair of functions - Quoting and non-quoting
- e.g., $ (quoting) and [[ (non-quoting)
Pair of Arguments - Quoting and non-quoting
- e.g., rm(...) (quoting) and rm(list = c(...)) (non-quoting)
Arg to control quoting
- e.g., library(rlang) (quoting) and library(pkg, character.only = TRUE) (where pkg <- "rlang")
Quote if evaluation fails
- help(var) - Quote, show help for var
- help(var) (where var <- "mean") - No quote, show help for mean
- help(var) (where var <- 10) - Quote fails, show help for var

… (dot-dot-dot) [When using … with quoting]

Sometimes need to supply an arbitrary list of expressions or arguments in a function (...)
But need a way to use these when we don’t necessarily have the names
Remember !! and !!! only work with functions that use rlang
Can use list2(...) to turn ... into “tidy dots” which can be unquoted and spliced
Require list2() if going to be passing or using !! or !!! in ...
list2() is a wrapper around dots_list() with the most common defaults

No need for list2()

d <- function(...) data.frame(list(...))
d(x = c(1:3), y = c(2, 4, 6))
#>   x y
#> 1 1 2
#> 2 2 4
#> 3 3 6

Require list2()

vars <- list(x = c(1:3), y = c(2, 4, 6))
d(!!!vars)
#> Error in !vars: invalid argument type
d2 <- function(...) data.frame(list2(...))
d2(!!!vars)
#>   x y
#> 1 1 2
#> 2 2 4
#> 3 3 6
# Same result but x and y evaluated later
vars_expr <- exprs(x = c(1:3), y = c(2, 4, 6))
d2(!!!vars_expr)  
#>   x y
#> 1 1 2
#> 2 2 4
#> 3 3 6

Getting argument names (symbols) from variables

nm <- "z"
val <- letters[1:4]
d2(x = 1:4, !!nm := val)

#>   x z
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d

`exec()` [Making your own …]

What if your function doesn’t have tidy dots?

Can’t use !! or := if doesn’t support rlang or dynamic dots

my_mean <- function(x, arg_name, arg_val) {
  mean(x, !!arg_name := arg_val)
}

my_mean(c(NA, 1:10), arg_name = "na.rm", arg_val = TRUE)     
#> Error in `my_mean()`:
#> ! `:=` can only be used within dynamic dots.

Let’s use the … from exec()

exec(.fn, ..., .env = caller_env())

my_mean <- function(x, arg_name, arg_val) {
  exec("mean", x, !!arg_name := arg_val)
}

my_mean(c(NA, 1:10), arg_name = "na.rm", arg_val = TRUE)     
#> [1] 5.5

Note that you do not unquote arg_val.

Also exec is useful for mapping over a list of functions:

x <- c(runif(10), NA)
funs <- c("mean", "median", "sd")
purrr::map_dbl(funs, exec, x, na.rm = TRUE)

#> [1] 0.4445205 0.4886247 0.3166360

Base R `do.call`

do.call(what, args)

what is a function to call
args is a list of arguments to pass to the function.

nrow(mtcars)
#> [1] 32
mtcars3 <- do.call("rbind", list(mtcars, mtcars, mtcars))
nrow(mtcars3)
#> [1] 96

Exercise 19.5.5 #1

One way to implement exec is shown here: Describe how it works. What are the key ideas?

exec_ <- function(f, ..., .env = caller_env()){
  args <- list2(...)
  do.call(f, args, envir  = .env)
}

Case Studies (side note)

Sometimes you want to run a bunch of models, without having to copy/paste each one.

BUT, you also want the summary function to show the appropriate model call, not one with hidden variables (e.g., lm(y ~ x, data = data)).

We can achieve this by building expressions and unquoting as needed:

library(purrr)

vars <- data.frame(x = c("hp", "hp"),
                   y = c("mpg", "cyl"))

x_sym <- syms(vars$x)
y_sym <- syms(vars$y)

formulae <- map2(x_sym, y_sym, \(x, y) expr(!!y ~ !!x))
formulae
#> [[1]]
#> mpg ~ hp
#> 
#> [[2]]
#> cyl ~ hp
models <- map(formulae, \(f) expr(lm(!!f, data = mtcars)))
summary(eval(models[[1]]))
#> 
#> Call:
#> lm(formula = mpg ~ hp, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.7121 -2.1122 -0.8854  1.5819  8.2360 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
#> hp          -0.06823    0.01012  -6.742 1.79e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.863 on 30 degrees of freedom
#> Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
#> F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

As a function:

lm_df <- function(df, data) {
  x_sym <- map(df$x, as.symbol)
  y_sym <- map(df$y, as.symbol)
  data <- enexpr(data)
  
  formulae <- map2(x_sym, y_sym, \(x, y) expr(!!y ~ !!x))
  models <- map(formulae, \(f) expr(lm(!!f, !!data)))
  
  map(models, \(m) summary(eval(m)))
}

vars <- data.frame(x = c("hp", "hp"),
                   y = c("mpg", "cyl"))
lm_df(vars, data = mtcars)
#> [[1]]
#> 
#> Call:
#> lm(formula = mpg ~ hp, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.7121 -2.1122 -0.8854  1.5819  8.2360 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
#> hp          -0.06823    0.01012  -6.742 1.79e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.863 on 30 degrees of freedom
#> Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
#> F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = cyl ~ hp, data = mtcars)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.27078 -0.74879 -0.06417  0.63512  1.74067 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 3.006795   0.425485   7.067 7.41e-08 ***
#> hp          0.021684   0.002635   8.229 3.48e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.006 on 30 degrees of freedom
#> Multiple R-squared:  0.693,  Adjusted R-squared:  0.6827 
#> F-statistic: 67.71 on 1 and 30 DF,  p-value: 3.478e-09