11.8 Data cleaning

Coerce date to correct format, replace attendance missing values with 0

chi_attendance <- chi_attendance |>
     mutate(
          the_date = ymd(Date),
          Attendance = na_if(Attendance, 0)
     )

Chicago ballpark attendance comparison

chi_attendance |> 
     ggplot(aes(
          x = wday(the_date), y = Attendance, 
          color = HomeTeam
          )
     ) +
     geom_jitter(height = 0, width = 0.2, alpha = 0.2) + 
     geom_smooth() + 
     scale_y_continuous("Attendance") + 
     scale_x_continuous(
          "Day of the Week", breaks = 1:7, 
          labels = wday(1:7, label = TRUE)
     ) + 
     scale_color_manual(values = crc_fc)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## Warning: Removed 10 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).