8.12 Classification trees (continued)

  • So, 2 other measures are preferable

    • The Gini Index defined by G=Kk=1ˆpmk(1ˆpmk) is a measure of total variance across the K classes

    • The Gini index takes on a small value if all of the ˆpmk’s are close to 0 or 1

    • For this reason the Gini index is referred to as a measure of node purity - a small value indicates that a node contains predominantly observations from a single class

    • An alternative to the Gini index is cross-entropy given by

D=Kk=1ˆpmklog(ˆpmk)

  • The Gini index and cross-entropy are very similar numerically