Ridge Regression

  • Ridge regression is very similar to least squares, except that the coefcients are estimated by minimizing a slightly diferent quantity

  • ˆβOLSargminˆβ(RSS)

  • ˆβRargminˆβ(RSS+λpk=1β2k)

  • λ tuning parameter (hyperparameter) for the shrinkage penalty

  • there’s one model parameter λ doesn’t shrink

    • (^β0)

Ridge Regression, Visually

Note the decrease in test MSE, and further that this is not computationally expensive: “One can show that computations required to solve (6.5), simultaneously for all values of \lambda, are almost identical to those for fitting a model using least squares.”

Preprocessing

Note that \beta_j^R aren’t scale invariant, so: \tilde{x}_{ij} = \frac{x_{ij}}{\sqrt{\frac{1}{n}\sum_i^n{(x_{ij} - \bar{x}_j)^2}}}

  • It is best to apply ridge regression after standardizing the predictors, using the formula above