Best Subset Selection (BSS)

  • Most straightforward approach - try them all!
  • To perform best subset selection, we fit a separate least squares regression for each possible combination of the p predictors.
  • That is, we fit all p models selection that contain exactly one predictor, all (p2)=p(p1)/2 models that contain exactly two predictors, and so forth.

BSS Algorithm

  1. Start with the null model (intercept-only model), M0.
  2. For k=1,2,...,p:
  • Fit all (pk) models containing k predictors
  • Let Mk denote the best of these (pk) models, where best is defined as having the lowest RSS, lowest deviance, etc
  1. Choose the best model among M0,...,Mp, where best is defined as having the lowest Cp, BIC, AIC, cross-validated MSE, or, alternatively, highest adjusted R2

Best Subset Selection (BSS)

  • Pros
    • Selects the best subset
  • Cons
    • Overfitting due to large search space.
    • Computationally expensive, intractable for large p (exponential, 2p, e.g. p=20 yields over 1 million possibilities)