8.12 Classification trees (continued)
So, 2 other measures are preferable
The Gini Index defined by \[G = \sum_{k=1}^{K}\hat{p}_{mk}(1-\hat{p}_{mk})\] is a measure of total variance across the K classes
The Gini index takes on a small value if all of the \(\hat{p}_{mk}\)’s are close to 0 or 1
For this reason the Gini index is referred to as a measure of node purity - a small value indicates that a node contains predominantly observations from a single class
An alternative to the Gini index is cross-entropy given by
\[D = - \sum_{k=1}^{K}\hat{p}_{mk}\log(\hat{p}_{mk})\]
- The Gini index and cross-entropy are very similar numerically