25.10 The problem of indirection explained

  • To make the problem a bit more clear, we can use a made up data frame:
df <- tibble(
  mean_var = 1,
  group_var = "g",
  group = 1,
  x = 10,
  y = 100
)

df |> 
  grouped_mean(group, x)
## # A tibble: 1 × 2
##   group_var `mean(mean_var)`
##   <chr>                <dbl>
## 1 g                        1
df |> 
  grouped_mean(group, y)
## # A tibble: 1 × 2
##   group_var `mean(mean_var)`
##   <chr>                <dbl>
## 1 g                        1
  • Regardless of how we call grouped_mean() it always does df |> group_by(group_var) |> summarize(mean(mean_var)), instead of df |> group_by(group) |> summarize(mean(x)) or df |> group_by(group) |> summarize(mean(y)).
  • This is a problem of indirection, and it arises because dplyr uses tidy evaluation to allow you to refer to the names of variables inside your data frame without any special treatment.