8.23 Boosting algorithm

where:

$\hat{f}(x)$ is the decision tree (model)

$r$ = residuals

$d$ = number of splits in each tree (controls the complexity of the boosted ensemble)

$\lambda$ = shrinkage parameter (a small positive number that controls the rate at which boosting learns; typically 0.01 or 0.001 but right choice can depend on the problem)

Each of the trees can be small, with just a few terminal nodes (determined by $d$ )
By fitting small trees to the residuals, we slowly improve our model ( $\hat{f}$ ) in areas where it doesn’t perform well
The shrinkage parameter $\lambda$ slows the process down further, allowing more and different shaped trees to ‘attack’ the residuals
Unlike bagging and random forests, boosting can OVERFIT if $B$ is too large. $B$ is selected via cross-validation