5.1 Validation Set Approach

  • This involves randomly splitting the data into a training set and validation set.

    • Note that in certain applications, such as time series analysis, it is not feasible to randomly split the data.
  • The advantage of the validation set approach is that it is conceptually simple to understand and implement.

  • However, the validation error rate is variable depending on the assignment of the training and validation sets.

  • Additionally, we are giving up valuable data points by not using all of the data to estimate the model.

    • Thus the validation error rate will tend to overestimate the test error rate.