Leave-one-out cross validation

  • Naive implementation requires n fits, one for each held out data point.

  • To compare models, we can use the elpd which summarizes how good they are at predicting new data. The elpd (expected log predictive density) measures how well the model predicts new data as the sum over of the log of the pointwise probability of a new dataset \(y\) from the fitted model. It can be estimated from cross validation (https://arxiv.org/pdf/1507.04544.pdf):

\[ \text{elpd}_{\text{loo}}=\sum_{i=1}^n\log{p(y_i|y_{-i})} \] where

\[ {p(y_i|y_{-i})} = \int p(y_i|\theta)p(\theta|y_{i-1})d\theta \] is the leave-one-out predictive density given the data without the ith data point.

Doing this brute force is … brutal.

scores <- rep(0,nrow(kidiq))
for(i in 1:nrow(kidiq))
{
  fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq[-i,], refresh = 0)
  pred <- posterior_linpred(fit,newdata = kidiq[i,])
  sigma <- as.data.frame(fit)$sigma
  if(i%%50 ==0 ) print(i)
  scores[i] <- log(mean(dnorm(pred - kidiq$kid_score[i], mean = 0, sd = sigma))) 
}
sum(scores)

# -1875.999
sd(scores)*sqrt(nrow(kidiq))

# 14.225
test = stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq, refresh = 0)
  • rstan has a function loo that can shortcut this using an approximation based on probability calculations.

  • This function returns elpd_loo -

fit_3 <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq, refresh = 0)
loo_3 <- loo(fit_3)
loo_3
## 
## Computed from 4000 by 434 log-likelihood matrix.
## 
##          Estimate   SE
## elpd_loo  -1876.0 14.2
## p_loo         4.0  0.4
## looic      3752.0 28.5
## ------
## MCSE of elpd_loo is 0.0.
## MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 1.2]).
## 
## All Pareto k estimates are good (k < 0.7).
## See help('pareto-k-diagnostic') for details.
  • Mostly useful to compare models.

  • One diagnostic message worth noting: if you get a warning about Pareto k estimates are unstable, then consider k-fold cross validation.

  • To compare two models:

fit_1 <- stan_glm(kid_score ~ mom_hs, data = kidiq, refresh =0 )
loo_1 <- loo(fit_1)
loo_compare(loo_3, loo_1)
##       elpd_diff se_diff
## fit_3   0.0       0.0  
## fit_1 -38.8       8.4