8.9 Pruning a tree

  1. Grow a very large tree \(T_0\) as before

  2. Apply cost-complexity pruning to \(T_0\) to obtain a sequence of BEST subtrees, as a function of \(\alpha\)

Cost complexity pruning minimizes (Eq. 8.4) \(\sum_{m=1}^{|T|}\sum_{x_i{\in}R_m}(y_i-\hat{y}_{R_m})^2 + \alpha|T|\)

where

\(\alpha\) \(\geq\) 0

\(|T|\) is the number of terminal nodes the sub tree \(|T|\) holds

\(R_m\) is the rectangle/region (i.e., the subset of predictor space) corresponding to the \(m\)th terminal node

\(\hat{y}_{R_m}\) is the mean response for the training observations in \(R_m\)

  • the tuning parameter \(\alpha\) controls:

    1. a trade-off between the subtree’s complexity (the number of terminal nodes)

    2. the subtree’s fit to the training data

  1. Choose \(\alpha\) using K-fold cross-validation

    • 3.1) repeat steps 1) and 2) for each \(K-1/K\)th fraction of training data

    • 3.2) average the results and pick \(\alpha\) to minimize the average MSE

      • Recall that in K-folds cross-validation (say K = 5): the model is estimated on 80% of the data five different times, the predictions are made for the remaining 20%, and the test MSEs are averaged
  2. Return to the subtree from Step 2) that corresponds to the chosen value of \(\alpha\)