8.20 Random forests: advantages over bagging
Random forests overcome this problem by forcing each split to consider only a subset of the predictors (typically a random sample \(m \approx \sqrt{p}\))
Thus at each split, the algorithm is NOT ALLOWED to consider a majority of the available predictors (essentially \((p - m)/p\) of the splits will not even consider the strong predictor, giving other predictors a chance)
This decorrelates the trees and makes the average of the resulting trees less variable (more reliable)
Only difference between bagging and random forests is the choice of predictor subset size \(m\) at each split: if a random forest is built using \(m = p\) that’s just bagging
For both, we build a number of decision trees on bootstrapped training samples