5.1 Validation Set Approach
This involves randomly splitting the data into a training set and validation set.
- Note that in certain applications, such as time series analysis, it is not feasible to randomly split the data.
The advantage of the validation set approach is that it is conceptually simple to understand and implement.
However, the validation error rate is variable depending on the assignment of the training and validation sets.
Additionally, we are giving up valuable data points by not using all of the data to estimate the model.
- Thus the validation error rate will tend to overestimate the test error rate.