8.9 Pruning a tree

  1. Grow a very large tree T0 as before

  2. Apply cost-complexity pruning to T0 to obtain a sequence of BEST subtrees, as a function of α

Cost complexity pruning minimizes (Eq. 8.4) |T|m=1xiRm(yiˆyRm)2+α|T|

where

α 0

|T| is the number of terminal nodes the sub tree |T| holds

Rm is the rectangle/region (i.e., the subset of predictor space) corresponding to the mth terminal node

ˆyRm is the mean response for the training observations in Rm

  • the tuning parameter α controls:

    1. a trade-off between the subtree’s complexity (the number of terminal nodes)

    2. the subtree’s fit to the training data

  1. Choose α using K-fold cross-validation

    • 3.1) repeat steps 1) and 2) for each K1/Kth fraction of training data

    • 3.2) average the results and pick α to minimize the average MSE

      • Recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged
  2. Return to the subtree from Step 2) that corresponds to the chosen value of α