8.11 Classification trees

  • Very similar to a regression tree except it predicts a qualitative (vs quantitative) response

  • We predict that each observation belongs to the most commonly occurring class of training observations in the region to which it belongs

  • In the classification setting, RSS cannot be used as a criterion for making the binary splits

  • A natural alternative to RSS is the classification error rate, i.e., the fraction of the training observations in that region that do not belong to the most common class:

\[E = 1 - \max_k(\hat{p}_{mk})\]

where \(\hat{p}_{mk}\) is the proportion of training observations in the \(m\)th region that are from the \(k\)th class

  • However, this error rate is unsuited for tree-based classification because \(E\) does not change much as the tree grows (lacks sensitivity)