22.17 dplyr and arrow (6)

Performance (2)

  • Let’s see time to getting number of books checked out per month in 2021 using the parquet files instead
seattle_pq |> 
  filter(CheckoutYear == 2021, MaterialType == "BOOK") |>
  group_by(CheckoutMonth) |>
  summarize(TotalCheckouts = sum(Checkouts)) |>
  arrange(desc(CheckoutMonth)) |>
  collect() |> 
  system.time()
#>    user  system elapsed 
#>   0.263   0.058   0.063