8.19 Random forests

  • A problem with bagging is that bagged trees may be highly similar to each other.

  • For example, if there is a strong predictor in the data set, most of the bagged trees will use this strong predictor in the top split so that

    • the trees will look quite similar

    • predictions from the bagged trees will be highly correlated

  • Averaging many highly correlated quantities does not lead to as large a reduction in variance as averaging many uncorrelated quantities