22.16 dplyr and arrow (5)

Performance (1)

  • Let’s see how long it takes getting number of books checked out per month in 2021 via reading in whole CSV
seattle_csv |> 
  filter(CheckoutYear == 2021, MaterialType == "BOOK") |>
  group_by(CheckoutMonth) |>
  summarize(TotalCheckouts = sum(Checkouts)) |>
  arrange(desc(CheckoutMonth)) |>
  collect() |> 
  system.time()
#>    user  system elapsed 
#>  11.951   1.297  11.387