6.3 Bagging and the Random Forest

The basic insight behind bagging is that an ensemble of models, where the predictions of many different models are averaged, is generally better than any individual model.

One way to build an ensemble is to train different kinds of models on the same data. This is hard to scale–each new model needs to be developed individually.
A more automatible way to make something like an ensemble is to train lots of the same kind of model on different sets of training data.
- How do we get different sets of training data? Bootstrap the original set!
- This is called bagging.
Any model type can be bagged.
When a decision tree is bagged, there is usually an extra step performed: only a random subset of the feature set is used for any individual tree model.
- This is known as a random forest™.
- A random forest is less interpretable than a single tree, but the relative importance of the features can be estimated.
  - Randomize the values of a feature and measure the resulting loss of model accuracy.
  - Measure the average purity gain for decisions made on each feature.