5.5 Time series data
For time series data where you’d want to allocate data to the training set/test set depending on a sorted order, you can use initial_time_split()
which works similarly to initial_split()
. The prop
argument can be used to specify what proportion of the first part of data should be used as the training set.
data(drinks)
<- initial_time_split(drinks)
drinks_split <- training(drinks_split)
train_data <- testing(drinks_split) test_data
The lag
argument can specify a lag period to use between the training and test set. This is useful if lagged predictors will be used during training and testing.
<- initial_time_split(drinks, lag = 12)
drinks_lag_split <- training(drinks_lag_split)
train_data_lag <- testing(drinks_lag_split)
test_data_lag c(max(train_data_lag$date), min(test_data_lag$date))
## [1] "2011-03-01" "2010-04-01"