20.10 Using tidy evaluation

Most of the time we might not call it directly, but call a function that uses eval_tidy() (becoming developer AND user)

Use case: resample and subset

We have a function that resamples a dataset:

resample <- function(df, n) {
  idx <- sample(nrow(df), n, replace = TRUE)
  df[idx, , drop = FALSE]
}

resample(sample_df, 10)
#>     a b c
#> 4   4 2 4
#> 3   3 3 2
#> 2   2 4 3
#> 3.1 3 3 2
#> 1   1 5 5
#> 5   5 1 1
#> 1.1 1 5 5
#> 5.1 5 1 1
#> 3.2 3 3 2
#> 5.2 5 1 1

But we also want to use subset and we want to create a function that allow us to resample and subset (with subset2()) in a single step.

First attempt:

subsample <- function(df, cond, n = nrow(df)) {
  df <- subset2(df, cond)
  resample(df, n)
}

subsample(sample_df, b == c, 10)
#> Warning in b == c: longer object length is not a multiple of shorter object
#> length
#> Error in eval_tidy(rows, data): 'list' object cannot be coerced to type 'integer'

What happened?

subsample() doesn’t quote any arguments and cond is evaluated normally

So we have to quote cond and unquote it when we pass it to subset2()

subsample <- function(df, cond, n = nrow(df)) {
  cond <- enquo(cond)

  df <- subset2(df, !!cond)
  resample(df, n)
}

subsample(sample_df, b == c, 10)
#>     a b c
#> 5   5 1 1
#> 1   1 5 5
#> 1.1 1 5 5
#> 5.1 5 1 1
#> 5.2 5 1 1
#> 1.2 1 5 5
#> 1.3 1 5 5
#> 5.3 5 1 1
#> 5.4 5 1 1
#> 5.5 5 1 1

Be careful!, potential ambiguity:

threshold_x <- function(df, val) {
  subset2(df, x >= val)
}

What would happen if x exists in the calling environment but doesn’t exist in df? Or if val also exists in df?

So, as developers of threshold_x() and users of subset2(), we have to add some pronouns:

threshold_x <- function(df, val) {
  subset2(df, .data$x >= .env$val)
}

Just remember:

As a general rule of thumb, as a function author it’s your responsibility to avoid ambiguity with any expressions that you create; it’s the user’s responsibility to avoid ambiguity in expressions that they create.