5.5 Time series data

For time series data where you’d want to allocate data to the training set/test set depending on a sorted order, you can use initial_time_split() which works similarly to initial_split(). The prop argument can be used to specify what proportion of the first part of data should be used as the training set.

data(drinks)
drinks_split <- initial_time_split(drinks)
train_data <- training(drinks_split)
test_data <- testing(drinks_split)

The lag argument can specify a lag period to use between the training and test set. This is useful if lagged predictors will be used during training and testing.

drinks_lag_split <- initial_time_split(drinks, lag = 12)
train_data_lag <- training(drinks_lag_split)
test_data_lag <- testing(drinks_lag_split)
c(max(train_data_lag$date), min(test_data_lag$date))
## [1] "2011-03-01" "2010-04-01"