Tidy evaluation: quasiquotation, quosures and data masks
We use eval()
to evaluate, run, or execute expressions. It requires two arguments:
expr
: the object to evaluate, either an expression or a symbol.env
: the environment in which to evaluate the expression or where to look for the values. Defaults to current env.source()
What do we need?
The real source is much more complex.
quosures are a data structure from rlang
containing both and expression and an environment
Quoting + closure because it quotes the expression and encloses the environment.
Three ways to create them:
new_quosure()
, creates a quosure from its components.With a quosure, we can use eval_tidy()
directly.
And get its components
Or set them
enquo()
o enquos()
, to capture user supplied expressions. They take the environment from where they’re created.quo()
and quos()
, to match to expr()
and exprs()
....
Quosures are just a convenience, but they are essential when it comes to working with ...
, because you can have each argument from ...
associated with a different environment.
Formulas were the inspiration for closures because they also capture an expression and an environment
#> Class 'formula' language ~runif(3)
#> ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
There was an early version of tidy evaluation with formulas, but there’s no easy way to implement quasiquotation with them.
They are actually call objects
with an attribute to store the environment
Nested quosures
With quosiquotation we can embed quosures in expressions.
And evaluate them
But for printing it’s better to use expr_print(x)
A data frame where the evaluated code will look first for its variable definitions.
Used in packages like dplyr and ggplot.
To use it we need to supply the data mask as a second argument to eval_tidy()
#> [1] 100 200 300 400 500 600 700 800 900 1000
Everything together, in one function.
But we need to create the objects that are not part of our data mask
Also doable with base::eval()
instead of rlang::eval_tidy()
but we have to use base::substitute()
instead of enquo()
(like we did for enexpr()
) and we need to specify the environment.
Ambiguity!!
An object value can come from the env or from the data mask
q1 <- new_quosure(expr(x * y + x), env = env(x = 1))
df <- data.frame(y = 1:5,
x = 10)
eval_tidy(q1, df)
#> [1] 20 30 40 50 60
We use pronouns:
.data$x
: x
from the data mask.env$x
: x
from the environmentbase::subset()
base::subset()
works like dplyr::filter()
: it selects rows of a data frame given an expression.
What do we need?
Most of the time we might not call it directly, but call a function that uses eval_tidy()
(becoming developer AND user)
Use case: resample and subset
We have a function that resamples a dataset:
#> a b c
#> 4 4 2 4
#> 3 3 3 2
#> 1 1 5 5
#> 1.1 1 5 5
#> 3.1 3 3 2
#> 5 5 1 1
#> 5.1 5 1 1
#> 3.2 3 3 2
#> 5.2 5 1 1
#> 4.1 4 2 4
But we also want to use subset and we want to create a function that allow us to resample and subset (with subset2()
) in a single step.
First attempt:
What happened?
subsample()
doesn’t quote any arguments and cond
is evaluated normally
So we have to quote cond
and unquote it when we pass it to subset2()
#> a b c
#> 5 5 1 1
#> 5.1 5 1 1
#> 5.2 5 1 1
#> 1 1 5 5
#> 5.3 5 1 1
#> 1.1 1 5 5
#> 5.4 5 1 1
#> 5.5 5 1 1
#> 1.2 1 5 5
#> 1.3 1 5 5
Be careful!, potential ambiguity:
What would happen if x
exists in the calling environment but doesn’t exist in df
? Or if val
also exists in df
?
So, as developers of threshold_x()
and users of subset2()
, we have to add some pronouns:
Just remember:
As a general rule of thumb, as a function author it’s your responsibility to avoid ambiguity with any expressions that you create; it’s the user’s responsibility to avoid ambiguity in expressions that they create.
Check 20.6 in the book!