Training bagging trees

  • As we are going to apply a random process is important set a seed.
  • nbagg defines how many iterations to include in the bagged model
  • coob = TRUE indicates to use the OOB error rate.
  • We use the argument control from the rpart function to change
    • minsplit the minimum number of observations that must exist in a node to split from 20 to 2.
    • cp (complexity parameter) prunes any split that does not improve the fit by cp and as we don’t want to prune then we change from 0.01 to 0.
# make bootstrapping reproducible
set.seed(123)

# train bagged model
ames_bag1 <- ipred::bagging(
  formula = Sale_Price ~ .,
  data = ames_train,
  nbagg = 100,  
  coob = TRUE,
  control = rpart.control(minsplit = 2, cp = 0)
)

ames_bag1

# Bagging regression trees with 100 bootstrap replications 
# 
# Call: bagging.data.frame(formula = Sale_Price ~ ., data = ames_train, 
#     nbagg = 100, coob = TRUE, control = rpart.control(minsplit = 2, 
#         cp = 0))
# 
# Out-of-bag estimate of root mean squared error:  26216.47