8.23 Boosting algorithm

where:

\(\hat{f}(x)\) is the decision tree (model)

\(r\) = residuals

\(d\) = number of splits in each tree (controls the complexity of the boosted ensemble)

\(\lambda\) = shrinkage parameter (a small positive number that controls the rate at which boosting learns; typically 0.01 or 0.001 but right choice can depend on the problem)

  • Each of the trees can be small, with just a few terminal nodes (determined by \(d\))

  • By fitting small trees to the residuals, we slowly improve our model (\(\hat{f}\)) in areas where it doesn’t perform well

  • The shrinkage parameter \(\lambda\) slows the process down further, allowing more and different shaped trees to ‘attack’ the residuals

  • Unlike bagging and random forests, boosting can OVERFIT if \(B\) is too large. \(B\) is selected via cross-validation