8.23 Boosting algorithm

where:

ˆf(x) is the decision tree (model)

r = residuals

d = number of splits in each tree (controls the complexity of the boosted ensemble)

λ = shrinkage parameter (a small positive number that controls the rate at which boosting learns; typically 0.01 or 0.001 but right choice can depend on the problem)

  • Each of the trees can be small, with just a few terminal nodes (determined by d)

  • By fitting small trees to the residuals, we slowly improve our model (ˆf) in areas where it doesn’t perform well

  • The shrinkage parameter λ slows the process down further, allowing more and different shaped trees to ‘attack’ the residuals

  • Unlike bagging and random forests, boosting can OVERFIT if B is too large. B is selected via cross-validation