Processing math: 100%
Best Subset Selection (BSS)
- Most straightforward approach - try them all!
- To perform best subset selection, we fit a separate least squares
regression for each possible combination of the p
predictors.
- That is, we fit all p models selection that contain exactly one
predictor, all (p2)=p(p−1)/2 models that contain exactly two predictors, and so forth.
BSS Algorithm
- Start with the null model (intercept-only model), M0.
- For k=1,2,...,p:
- Fit all (pk) models containing k predictors
- Let Mk denote the best of these (pk) models, where
best is defined as having the lowest RSS, lowest deviance, etc
- Choose the best model among M0,...,Mp,
where best is defined as having the lowest Cp, BIC, AIC,
cross-validated MSE, or, alternatively, highest adjusted R2
Best Subset Selection (BSS)
- Pros
- Cons
- Overfitting due to large search space.
- Computationally expensive, intractable for large p
(exponential, 2p, e.g. p=20 yields over 1 million
possibilities)