Non-standard evaluation (NSE): dealing with expressions as function inputs in base R.
Tidy evaluation: quasiquotation, quosures and data masks.
We use base::eval() to evaluate, run, or execute expressions. It requires two arguments:
expr: the object to evaluate, either an expression or a symbol.envir: the environment in which to evaluate the expression or where to look for the values. Defaults to current environment (techncially parent.frame() which is the environment that eval() was called from).What do we need?
The real source is much more complex.
eval() needs both the code and the environment to run the code in.
quosures are a data structure from rlang containing both the expression and the environment.
Quoting + closure because it quotes the expression and encloses the environment.
Three ways to create quosures, in the upcoming slides.
Used mostly for learning: new_quosure(), creates a quosure from its components.
Used in the real world: enquo() or enquos(), to capture user supplied expressions. They take the environment from where they are created.
Almost never used: quo() and quos(), to match to expr() and exprs().
The purpose of the concept of quosure is to be able to pass a single object to rlang::eval_tidy(), as opposed to the expression-environment pair, as required for base::eval():
Get the quosure components if you need them:
Or set them
Quosures are just a convenience, but they are essential when it comes to working with ..., because you can have different arguments from ... associated with different environments.
Formulas were the predecessor and inspiration for quosures because they also capture an expression and an environment.
#> Class 'formula' language ~runif(3)
#> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
(This creates weird problems when functions return formulas, which in turn may drag the entire GlobalEnv with them. Have 10Gb worth of objects in there? Let’s store them all!!!)
There was an early version of tidy evaluation with formulas, but there’s no easy way to implement quasiquotation with them.
They are actually call objects
with an attribute to store the environment
With quosiquotation we can embed quosures in expressions.
And evaluate them:
Display: it is better to use expr_print(x) (note the subtle color differences indicating that the different xs come from different environments).
Data mask is a special type of quosure where the target environment is a data frame. Thus the evaluated code will look first for the columns in the specified data frame.
Used in packages like dplyr and ggplot2.
Supply the data mask as a second argument to eval_tidy():
Reimplementing base::with() for the purposes of working with a data.frame:
But we need to create the objects that are not part of our data mask
Also doable with base::eval() instead of rlang::eval_tidy() but a strict base implementation would need to use base::substitute() instead of enquo() (like we did for enexpr()) , and we need to specify the environment.
When you write
with2(df, y = x)
Do you mean x as the column in df, or as an object in the environment?
(devtools::check() finds these and complains about ‘no visible global bindings’.)
An object value can come from the env or from the data
rlang provides pronouns:
.data$x: take x from the data mask.env$x: take x from the environmentbase::subset() works like dplyr::filter(): it selects rows of a data frame given an expression.
What do we need?
#> a b c
#> 1 1 5 5
#> 5 5 1 1
#> Error in `subset2()`:
#> ! rows argument does not evaluate to a subsetting condition
#> Error in `subset2()`:
#> ! rows argument does not evaluate to a subsetting condition
Let us implement one more feature of base::subset(select=_some columns_):
(base trivia: why do you need drop = FALSE?)
#> b c d
#> 1 2 3 4
tidyselect helpers: provide more functions that return logical expressions to be added to cols.
Most of the time we might not do it in our code, but we often end up calling a function that uses eval_tidy() (becoming developer AND user)
Use case: resample and subset
We have a function that resamples a dataset:
#> a b c
#> 2 2 4 3
#> 2.1 2 4 3
#> 5 5 1 1
#> 3 3 3 2
#> 4 4 2 4
#> 2.2 2 4 3
#> 2.3 2 4 3
#> 5.1 5 1 1
#> 2.4 2 4 3
#> 3.1 3 3 2
But we also want to use subset and we want to create a function that allow us to resample and subset (with subset2()) in a single step.
First attempt:
What happened?
subsample() doesn’t quote any arguments and cond is evaluated normally. The caller environment does not have b and c (or what is worse, if it does, the intent of the condition goes out the window.)
So we have to quote cond and unquote it when we pass it to subset2()
Be careful!, potential ambiguity:
What would happen if x exists in the calling environment but doesn’t exist in df? Or if val also exists in df?
So, as developers of threshold_x() and users of subset2(), we have to add disambiguating pronouns:
Thresholding on steroids:
It may not be possible to evaluate expr only in the data mask, because the data mask does not include any functions nor operations like + or ==.
Just remember:
As a general rule of thumb, as a function author it’s your responsibility to avoid ambiguity with any expressions that you create; it’s the user’s responsibility to avoid ambiguity in expressions that they create.
substitute()The base world forces us to evaluate in the caller environment rather in the environment where it is defined (quosure); loss of flexibility.
match.call()match.call() captures the entire call. A number of base functions (and base-only packages like survey) use it:
lm() itself cannot really do better as that was all it received, and it could not possibly figure out the components. To overcome this problem, we need to capture the arguments as expressions, create the call to lm() using unquoting, then evaluate that call.
#> lm(mpg ~ disp, data = mtcars)
#>
#> Call:
#> lm(formula = mpg ~ disp, data = mtcars)
#>
#> Coefficients:
#> (Intercept) disp
#> 29.59985 -0.04122