5.4 Advantages of LOOCV over Validation Set Approach

  • There are several advantages to LOOCV over validation set approach.

    1. It has less bias since models are repeatedly fitted on slightly different data sets, so it tends to not overestimate the test error as much as the validation set approach.
    2. The estimated test error will always be the same when LOOCV is performed on the entire data set.
  • The major disadvantage to LOOCV is that it is computationally expensive.

  • A special case: for least-squares linear or polynomial regression, this shortcut makes the cost of LOOCV the same as that for a single model fit:

\[CV_{n} = \frac{1}{n}{\sum_{i=1}^{n}}\left(\frac{y_{i} - \hat{y_{i}}}{1 - h_{i}}\right)^2\] where \(h_{i}\) is the leverage for a given residual as defined in equation 3.37 in the book for a simple linear regression. Its value falls between 1 and \(1/n\), so that observations whose residual has high leverage will contribute relatively more to the CV statistic.

  • In general, LOOCV can be used for various kinds of models, including logistic regression, LDA, and QDA.