Best Subset Selection (BSS) | Introduction to Statistical Learning Using R Book Club

Loading [MathJax]/jax/output/CommonHTML/jax.js

Best Subset Selection (BSS)

Most straightforward approach - try them all!
To perform best subset selection, we fit a separate least squares regression for each possible combination of the p predictors.
That is, we fit all p models selection that contain exactly one predictor, all $(^p_2) = p(p - 1)/2$ models that contain exactly two predictors, and so forth.

BSS Algorithm

Start with the null model (intercept-only model), $\mathcal{M}_0$ .
For $k = 1, 2, ..., p$ :

Fit all $(^p_k)$ models containing $k$ predictors
Let $\mathcal{M}_k$ denote the best of these $(^p_k)$ models, where best is defined as having the lowest RSS, lowest deviance, etc

Choose the best model among $\mathcal{M}_0, ..., \mathcal{M}_p$ , where best is defined as having the lowest $C_p$ , $BIC$ , $AIC$ , cross-validated MSE, or, alternatively, highest adjusted $R^2$

Best Subset Selection (BSS)

Pros
- Selects the best subset
Cons
- Overfitting due to large search space.
- Computationally expensive, intractable for large $p$ (exponential, $2^p$ , e.g. p=20 yields over 1 million possibilities)