8.9 Pruning a tree

Grow a very large tree $T_0$ as before
Apply cost-complexity pruning to $T_0$ to obtain a sequence of BEST subtrees, as a function of $\alpha$

Cost complexity pruning minimizes (Eq. 8.4) $\sum_{m=1}^{|T|}\sum_{x_i{\in}R_m}(y_i-\hat{y}_{R_m})^2 + \alpha|T|$

where

$\alpha$ $\geq$ 0

$|T|$ is the number of terminal nodes the sub tree $|T|$ holds

$R_m$ is the rectangle/region (i.e., the subset of predictor space) corresponding to the $m$ th terminal node

$\hat{y}_{R_m}$ is the mean response for the training observations in $R_m$

the tuning parameter $\alpha$ controls:
1. a trade-off between the subtree’s complexity (the number of terminal nodes)
2. the subtree’s fit to the training data

Choose $\alpha$ using K-fold cross-validation
- 3.1) repeat steps 1) and 2) for each $K-1/K$ th fraction of training data
- 3.2) average the results and pick $\alpha$ to minimize the average MSE
  - Recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged
Return to the subtree from Step 2) that corresponds to the chosen value of $\alpha$