9.3 - Partitioning

CART (Classification and Regression Tree) uses binary recursive partitioning (it’s recursive because each split or rule depends on the the splits above it).

For regression problems, the objective function to minimize is the total SSE as defined in the equation below:

\[\begin{equation} S S E=\sum_{i \in R_1}\left(y_i-c_1\right)^2+\sum_{i \in R_2}\left(y_i-c_2\right)^2 \end{equation}\]

For classification problems, the partitioning is usually made to maximize the reduction in cross-entropy or the Gini index (i.e. measure of purity).

Having found the best feature/split combination, the data are partitioned into two regions and the splitting process is repeated on each of the two regions (hence the name binary recursive partitioning). This process is continued until a suitable stopping criterion is reached (e.g., a maximum depth is reached or the tree becomes “too complex”).