12.2 Motivation

Below, we create an app that allows you to filter a numeric variable to select rows that are greater than a threshold. The app runs without error, but it doesn’t return the correct result — all the rows have values of carat less than 1. The goal of the chapter is to help you understand why this doesn’t work, and why dplyr thinks you have asked for filter(diamonds, "carat" > 1).

num_vars <- c("carat", "depth", "table", "price", "x", "y", "z")
ui <- fluidPage(
  selectInput("var", "Variable", choices = num_vars),
  numericInput("min", "Minimum", value = 1),
  tableOutput("output")
)
server <- function(input, output, session) {
  data <- reactive(diamonds %>% filter(input$var > input$min))
  output$output <- renderTable(head(data()))
}

This is a problem of indirection: normally when using tidyverse functions you type the name of the variable directly in the function call. But now you want to refer to it indirectly: the variable (carat) is stored inside another variable (input$var).

  • An env-variable (environment variable) is a “programming” variables that you create with <-. input$var is a env-variable.

  • A data-variable (data frame variables) is “statistical” variable that lives inside a data frame. carat is a data-variable.

With these new terms we can make the problem of indirection more clear: we have a data-variable (carat) stored inside an env-variable (input$var), and we need some way to tell dplyr this. There are two slightly different ways to do this depending on whether the function you’re working with is a “data-masking” function or a “tidy-selection” function.