Choosing the best model

  • You have to punish models for having too many predictors

  • Whatever the method, RSS decreases / R2 increases as we go from Mk to Mk+1. Thus, Mp always wins that contest.

  • Going with Mp doesn’t provide either of the benefits: model interpretability and variance reduction (overfitting)

  • We’ll need to estimate test error!

Adjustment Methods

  • Cp=1n(Rss+2kˆσ2)
  • ˆσ2 is an “estimate of variance of the error ϵ associated with each response measurement
    • typically estimated using Mp
    • if pn estimate is going to be poor or even zero.
  • AIC=2k2ln(ˆL)
  • BIC=kln(N)2ln(ˆL)
  • adjusted R2=1RSSTSSn1nk1

Avoiding Adjustment Methods

  • ˆσ2 can be hard to come by
  • adjustment methods make assumptions about true model (e.g. Gaussian errors)
  • so cross-validate!