5.4 Advantages of LOOCV over Validation Set Approach
There are several advantages to LOOCV over validation set approach.
- It has less bias since models are repeatedly fitted on slightly different data sets, so it tends to not overestimate the test error as much as the validation set approach.
- The estimated test error will always be the same when LOOCV is performed on the entire data set.
The major disadvantage to LOOCV is that it is computationally expensive.
A special case: for least-squares linear or polynomial regression, this shortcut makes the cost of LOOCV the same as that for a single model fit:
\[CV_{n} = \frac{1}{n}{\sum_{i=1}^{n}}\left(\frac{y_{i} - \hat{y_{i}}}{1 - h_{i}}\right)^2\] where \(h_{i}\) is the leverage for a given residual as defined in equation 3.37 in the book for a simple linear regression. Its value falls between 1 and \(1/n\), so that observations whose residual has high leverage will contribute relatively more to the CV statistic.
- In general, LOOCV can be used for various kinds of models, including logistic regression, LDA, and QDA.