8.12 Classification trees (continued)

  • So, 2 other measures are preferable

    • The Gini Index defined by \[G = \sum_{k=1}^{K}\hat{p}_{mk}(1-\hat{p}_{mk})\] is a measure of total variance across the K classes

    • The Gini index takes on a small value if all of the \(\hat{p}_{mk}\)’s are close to 0 or 1

    • For this reason the Gini index is referred to as a measure of node purity - a small value indicates that a node contains predominantly observations from a single class

    • An alternative to the Gini index is cross-entropy given by

\[D = - \sum_{k=1}^{K}\hat{p}_{mk}\log(\hat{p}_{mk})\]

  • The Gini index and cross-entropy are very similar numerically