5.2 Common methods for splitting data
Choosing how to conduct the split of the data into training and test sets may not be a trivial task. It depends on the data and the purpose.
The most common type of sampling is known as random sampling and it is done readily in R using the rsample package with the initial_split()
function. For the Ames housing dataset, the call would be:
library(tidymodels)
set.seed(123)
data(ames)
<- initial_split(ames, prop = 0.80)
ames_split ames_split
## <Training/Testing/Total>
## <2344/586/2930>
The object ames_split
is an rsplit
object. To get the training and test results you can call on training()
and test()
:
<- training(ames_split)
ames_train <- testing(ames_split) ames_test