5.2 Common methods for splitting data

Choosing how to conduct the split of the data into training and test sets may not be a trivial task. It depends on the data and the purpose.

The most common type of sampling is known as random sampling and it is done readily in R using the rsample package with the initial_split()function. For the Ames housing dataset, the call would be:

library(tidymodels)
set.seed(123)
data(ames)
ames_split <- initial_split(ames, prop = 0.80)
ames_split
## <Training/Testing/Total>
## <2344/586/2930>

The object ames_split is an rsplit object. To get the training and test results you can call on training() and test():

ames_train <- training(ames_split)
ames_test  <- testing(ames_split)