9.7 Posterior prediction
Quiz
Suppose a weather report indicates that tomorrow will be a 75-degree day in D.C. What’s your posterior guess of the number of riders that Capital Bikeshare should anticipate?
Answer:
One option is
\[ -2194.24 + 82.16 * 75 = 3967.76 \]
But this does not take into account :
Sampling variability
Posterior variability
9.7.1 Building a posterior predictive model
The posterior predictive model takes into account both kinds of variability. We can approximate this posterior predictive model with our 20 000 samples of parameters.
# Predict rides for each parameter set in the chain
set.seed(84735)
<- bike_model_df %>%
predict_75 mutate(mu = `(Intercept)` + temp_feel*75, # <- at 75 degrees
y_new = rnorm(20000, mean = mu, sd = sigma)) # <- sampling var
head(predict_75, 3)
(Intercept) temp_feel sigma mu y_new1 -2657 88.16 1323 3955 4838
2 -2188 83.01 1323 4038 3874
3 -1984 81.54 1363 4132 5196
Interesting point is mu
(\(\mu\)) vs. y_new
(\(Y_{new}\)).
# Construct 80% posterior credible intervals
%>%
predict_75 summarize(lower_mu = quantile(mu, 0.025),
upper_mu = quantile(mu, 0.975),
lower_new = quantile(y_new, 0.025),
upper_new = quantile(y_new, 0.975))
lower_mu upper_mu lower_new upper_new1 3843 4095 1500 6482
\(\mu\) is average in ridership for 75 degree
\(Y_\text{new}\) is for a specific day (with 75 degree)
\(\Rightarrow\) More accuracy in predicting an average than for an unique point!