4.3 Quick Intuition on Collective Geoms

  • Last chapter was on individual geoms. This chapter is on collective geoms.
  • Oversimplification (but maybe useful)
    • individual numbers vs the sum of the numbers
      • sum converts a series of numbers (“individual”): 4, 7, 9, 3, 3
      • to a single number (“collective”): 26
    • home prices
      • under individual geoms each home price has a point on a plot/table
      • under collective geoms we may use median as a single number that summarizes all individuals

This blog post by Simon Jackson illustrates these foundations using mtcars. The points are individual geoms and the bars are a collective geom showing the average of the individual observations.

id <- mtcars %>% 
  tibble::rownames_to_column() %>%
  as_tibble() %>%
  mutate(am = factor(am, levels = c(0, 1), labels = c("automatic", "manual")))

gd <- id %>% 
        group_by(am) %>% 
        summarise(hp = mean(hp))

ggplot(id, aes(x = am, y = hp, color = am, fill = am)) +
  geom_bar(data = gd, stat = "identity", alpha = 0.3) +
  ggrepel::geom_text_repel(aes(label = rowname), color = "black", size = 2.5, segment.color = "grey") +
  geom_point() +
  guides(color = "none", fill = "none") +
  theme_bw() +
  labs(
    title = "Car horespower by transmission type",
    x = "Transmission",
    y = "Horsepower"
  )

Next, a separate longitudinal study from the blog post (because the book example is also a longitudinal study). This example uses the ourworldindata dataset which shows healthcare spending per country over time.

#library(devtools)
#install_github("drsimonj/ourworldindata")

library(ourworldindata)

id <- financing_healthcare %>% 
        filter(continent %in% c("Oceania", "Europe") & between(year, 2001, 2005)) %>% 
        select(continent, country, year, health_exp_total) %>% 
        na.omit()
  • raw data
id
## # A tibble: 275 × 4
##    continent country  year health_exp_total
##    <chr>     <chr>   <int>            <dbl>
##  1 Europe    Albania  2001             198.
##  2 Europe    Albania  2002             225.
##  3 Europe    Albania  2003             236.
##  4 Europe    Albania  2004             264.
##  5 Europe    Albania  2005             277.
##  6 Europe    Andorra  2001            1432.
##  7 Europe    Andorra  2002            1565.
##  8 Europe    Andorra  2003            1601.
##  9 Europe    Andorra  2004            1662.
## 10 Europe    Andorra  2005            1794.
## # ℹ 265 more rows
  • individual observations are at the combined country-year level. For the purposes of plotting, though, the “individual geom” will just be the country and all of the yearly observations for each country.
gd <- id %>% 
        group_by(continent, year) %>% 
        summarise(health_exp_total = mean(health_exp_total))


ggplot(id, aes(x = year, y = health_exp_total, color = continent)) +
  geom_line(aes(group = country), alpha = 0.3) +
  geom_line(data = gd, alpha = 0.8, size = 3) +
  theme_bw() +
  labs(
    title = "Changes in healthcare spending\nacross countries and world regions",
    x = NULL,
    y = "Total healthcare investment ($)",
    color = NULL
  )