8.20 Random forests: advantages over bagging

Random forests overcome this problem by forcing each split to consider only a subset of the predictors (typically a random sample $m \approx \sqrt{p}$ )
Thus at each split, the algorithm is NOT ALLOWED to consider a majority of the available predictors (essentially $(p - m)/p$ of the splits will not even consider the strong predictor, giving other predictors a chance)
This decorrelates the trees and makes the average of the resulting trees less variable (more reliable)
Only difference between bagging and random forests is the choice of predictor subset size $m$ at each split: if a random forest is built using $m = p$ that’s just bagging
For both, we build a number of decision trees on bootstrapped training samples