8.9 Pruning a tree
Grow a very large tree T0 as before
Apply cost-complexity pruning to T0 to obtain a sequence of BEST subtrees, as a function of α
Cost complexity pruning minimizes (Eq. 8.4) ∑|T|m=1∑xi∈Rm(yi−ˆyRm)2+α|T|
where
α ≥ 0
|T| is the number of terminal nodes the sub tree |T| holds
Rm is the rectangle/region (i.e., the subset of predictor space) corresponding to the mth terminal node
ˆyRm is the mean response for the training observations in Rm
the tuning parameter α controls:
a trade-off between the subtree’s complexity (the number of terminal nodes)
the subtree’s fit to the training data
Choose α using K-fold cross-validation
3.1) repeat steps 1) and 2) for each K−1/Kth fraction of training data
3.2) average the results and pick α to minimize the average MSE
- Recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged
Return to the subtree from Step 2) that corresponds to the chosen value of α