Best Subset Selection (BSS)
- Most straightforward approach - try them all!
- To perform best subset selection, we fit a separate least squares
regression for each possible combination of the p
predictors.
- That is, we fit all p models selection that contain exactly one
predictor, all \((^p_2) = p(p - 1)/2\) models that contain exactly two predictors, and so forth.
BSS Algorithm
- Start with the null model (intercept-only model), \(\mathcal{M}_0\).
- For \(k = 1, 2, ..., p\):
- Fit all \((^p_k)\) models containing \(k\) predictors
- Let \(\mathcal{M}_k\) denote the best of these \((^p_k)\) models, where
best is defined as having the lowest RSS, lowest deviance, etc
- Choose the best model among \(\mathcal{M}_0, ..., \mathcal{M}_p\),
where best is defined as having the lowest \(C_p\), \(BIC\), \(AIC\),
cross-validated MSE, or, alternatively, highest adjusted \(R^2\)
Best Subset Selection (BSS)
- Pros
- Cons
- Overfitting due to large search space.
- Computationally expensive, intractable for large \(p\)
(exponential, \(2^p\), e.g. p=20 yields over 1 million
possibilities)