21.8 SQL basics

The top-level components of SQL are called statements. Common statements include CREATE for defining new tables, INSERT for adding data, and SELECT for retrieving data. We will focus on SELECT statements, also called queries, because they are almost exclusively what you’ll use as a data scientist.

flights |> show_query()
## <SQL>
## SELECT *
## FROM flights
planes |> show_query()
## <SQL>
## SELECT *
## FROM planes
  • WHERE and ORDER BY control which rows are included and how they are ordered:
flights |> 
  filter(dest == "IAH") |> 
  arrange(dep_delay) |>
  show_query()
## <SQL>
## SELECT flights.*
## FROM flights
## WHERE (dest = 'IAH')
## ORDER BY dep_delay
  • GROUP BY converts the query to a summary, causing aggregation to happen:
flights |> 
  group_by(dest) |> 
  summarize(dep_delay = mean(dep_delay, na.rm = TRUE)) |> 
  show_query()
## <SQL>
## SELECT dest, AVG(dep_delay) AS dep_delay
## FROM flights
## GROUP BY dest