8.9 Pruning a tree
Grow a very large tree \(T_0\) as before
Apply cost-complexity pruning to \(T_0\) to obtain a sequence of BEST subtrees, as a function of \(\alpha\)
Cost complexity pruning minimizes (Eq. 8.4) \(\sum_{m=1}^{|T|}\sum_{x_i{\in}R_m}(y_i-\hat{y}_{R_m})^2 + \alpha|T|\)
where
\(\alpha\) \(\geq\) 0
\(|T|\) is the number of terminal nodes the sub tree \(|T|\) holds
\(R_m\) is the rectangle/region (i.e., the subset of predictor space) corresponding to the \(m\)th terminal node
\(\hat{y}_{R_m}\) is the mean response for the training observations in \(R_m\)
the tuning parameter \(\alpha\) controls:
a trade-off between the subtree’s complexity (the number of terminal nodes)
the subtree’s fit to the training data
Choose \(\alpha\) using K-fold cross-validation
3.1) repeat steps 1) and 2) for each \(K-1/K\)th fraction of training data
3.2) average the results and pick \(\alpha\) to minimize the average MSE
- Recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged
Return to the subtree from Step 2) that corresponds to the chosen value of \(\alpha\)