12.9 Data Set 2

In 2017, Cards Against Humanity Saves America launched a series of monthly surveys in order to get the “Pulse of the Nation”

  • Y: number of books somebody has read in the past year
  • X1: age
  • X2: whether they’d rather be wise but unhappy or happy but unwise

X2={1wise but unhappy0happy but unwise

# Load data
data(pulse_of_the_nation)
pulse <- pulse_of_the_nation %>% 
  filter(books < 100) # avoid outliers
p1 <- ggplot(pulse, aes(x = books)) + 
  geom_histogram(color = "white")
p2 <- ggplot(pulse, aes(y = books, x = age)) + 
  geom_point()
p3 <- ggplot(pulse, aes(y = books, x = wise_unwise)) + 
  geom_boxplot()

# patchwork
p1 + p2 + p3

12.9.1 Poisson Regression

Should we model books with Poisson regression?

books_poisson_sim <- stan_glm(
  books ~ age + wise_unwise, 
  data = pulse, family = poisson,
  prior_intercept = normal(0, 2.5, autoscale = TRUE),
  prior = normal(0, 2.5, autoscale = TRUE), 
  prior_aux = exponential(1, autoscale = TRUE),
  chains = 4, iter = 5000*2, seed = 84735)

12.9.2 Posterior Predictive Check

pp_check(books_poisson_sim) + 
  xlab("books")

12.9.3 Overdispersion

A random variable Y is overdispersed if the observed variability in Y exceeds the variability expected by the assumed probability model of Y.