Choosing the best model | Introduction to Statistical Learning Using R Book Club

Processing math: 100%

Choosing the best model

You have to punish models for having too many predictors
Whatever the method, $RSS$ decreases / $R^2$ increases as we go from $\mathcal{M}_k$ to $\mathcal{M}_{k+1}$ . Thus, $\mathcal{M}_p$ always wins that contest.
Going with $\mathcal{M}_p$ doesn’t provide either of the benefits: model interpretability and variance reduction (overfitting)
We’ll need to estimate test error!

Adjustment Methods

$C_p = \frac{1}{n}(Rss + 2k\hat{\sigma}^2)$
$\hat{\sigma}^2$ is an “estimate of variance of the error $\epsilon$ associated with each response measurement
- typically estimated using $\mathcal{M}_p$
- if $p \approx n$ estimate is going to be poor or even zero.
$AIC = 2k - 2ln(\hat{L})$
$BIC = k \cdot ln(N) - 2ln(\hat{L})$
adjusted $R^2 = 1 - \frac{RSS}{TSS} \cdot \frac{n-1}{n-k-1}$

Avoiding Adjustment Methods

$\hat{\sigma}^2$ can be hard to come by
adjustment methods make assumptions about true model (e.g. Gaussian errors)
so cross-validate!