12.4 Tidy-selection

As well as data-masking, there’s one other important part of tidy evaluation: tidy-selection. Tidy-selection provides a concise way of selecting columns by position, name, or type. It’s used in dplyr::select() and dplyr::across(), and in many functions from tidyr, like pivot_longer(), pivot_wider(), separate(), extract(), and unite().

12.4.1 Indirection

To refer to variables indirectly use any_of() or all_of(): both expect a character vector env-variable containing the names of data-variables. The only difference is what happens if you supply a variable name that doesn’t exist in the input: all_of() will throw an error, while any_of() will silently ignore it.

ui <- fluidPage(
  selectInput("vars", "Variables", names(mtcars), multiple = TRUE),
  tableOutput("data")
)

server <- function(input, output, session) {
  output$data <- renderTable({
    req(input$vars)
    mtcars %>% select(all_of(input$vars))
  })
}

12.4.2 Tidy Selection and Data Masking

Working with multiple variables is trivial when you’re working with a function that uses tidy-selection: you can just pass a character vector of variable names into any_of() or all_of(). Wouldn’t it be nice if we could do that in data-masking functions too? That’s the idea of the across() function, added in dplyr 1.0.0. It allows you to use tidy-selection inside data-masking functions. across() is typically used with either one or two arguments. The first argument selects variables, and is useful in functions like group_by() or distinct().

ui <- fluidPage(
  selectInput("vars", "Variables", names(mtcars), multiple = TRUE),
  tableOutput("count")
)

server <- function(input, output, session) {
  output$count <- renderTable({
    req(input$vars)
    
    mtcars %>% 
      group_by(across(all_of(input$vars))) %>% 
      summarise(n = n(), .groups = "drop")
  })
}

This app allows you to select any number of variables and count their unique combinations. See live at https://hadley.shinyapps.io/ms-across.

The second argument is a function (or list of functions) that’s applied to each selected column. That makes it a good fit for mutate() and summarise() where you typically want to transform each variable in some way. For example, the following code lets the user select any number of grouping variables, and any number of variables to summarise with their means.

ui <- fluidPage(
  selectInput("vars_g", "Group by", names(mtcars), multiple = TRUE),
  selectInput("vars_s", "Summarise", names(mtcars), multiple = TRUE),
  tableOutput("data")
)

server <- function(input, output, session) {
  output$data <- renderTable({
    mtcars %>% 
      group_by(across(all_of(input$vars_g))) %>% 
      summarise(across(all_of(input$vars_s), mean), n = n())
  })
}