8.11 Classification trees

Very similar to a regression tree except it predicts a qualitative (vs quantitative) response
We predict that each observation belongs to the most commonly occurring class of training observations in the region to which it belongs
In the classification setting, RSS cannot be used as a criterion for making the binary splits
A natural alternative to RSS is the classification error rate, i.e., the fraction of the training observations in that region that do not belong to the most common class:

$E = 1 - \max_k(\hat{p}_{mk})$

where $\hat{p}_{mk}$ is the proportion of training observations in the $m$ th region that are from the $k$ th class

However, this error rate is unsuited for tree-based classification because $E$ does not change much as the tree grows (lacks sensitivity)