Best Subset Selection (BSS)

  • Most straightforward approach - try them all!
  • To perform best subset selection, we fit a separate least squares regression for each possible combination of the p predictors.
  • That is, we fit all p models selection that contain exactly one predictor, all \((^p_2) = p(p - 1)/2\) models that contain exactly two predictors, and so forth.

BSS Algorithm

  1. Start with the null model (intercept-only model), \(\mathcal{M}_0\).
  2. For \(k = 1, 2, ..., p\):
  • Fit all \((^p_k)\) models containing \(k\) predictors
  • Let \(\mathcal{M}_k\) denote the best of these \((^p_k)\) models, where best is defined as having the lowest RSS, lowest deviance, etc
  1. Choose the best model among \(\mathcal{M}_0, ..., \mathcal{M}_p\), where best is defined as having the lowest \(C_p\), \(BIC\), \(AIC\), cross-validated MSE, or, alternatively, highest adjusted \(R^2\)

Best Subset Selection (BSS)

  • Pros
    • Selects the best subset
  • Cons
    • Overfitting due to large search space.
    • Computationally expensive, intractable for large \(p\) (exponential, \(2^p\), e.g. p=20 yields over 1 million possibilities)