6.4 Considerations in High Dimensions
Figure 6.22
- Data sets containing more features \(p\) than observations \(n\) are often referred to as high-dimensional.
 - Modern data can have a huge number of predictors (eg: 500k SNPs, every word ever entered in a search)
 - When \(n <= p\), linear regression memorizes the training data, but can suck on test data.
 
Figure 6.23 - simulated data set with n = 20 training observations, all unrelated to outcome.
Lasso (etc) vs Dimensionality
- Reducing flexibility (all the stuff in this chapter) can help.
 - It’s important to choose good tuning parameters for whatever method you use.
 - Features that aren’t associated with \(Y\) increase test error (“curse
of dimensionality”).
- Fit to noise in training, noise in test is different.
 
 - When \(p > n\), never use train MSE, p-values, \(R^2\), etc, as evidence of goodness of fit because they’re likely to be wildly different from test values.
 
Figure 6.24