8.12 Classification trees (continued)
So, 2 other measures are preferable
The Gini Index defined by G=K∑k=1ˆpmk(1−ˆpmk) is a measure of total variance across the K classes
The Gini index takes on a small value if all of the ˆpmk’s are close to 0 or 1
For this reason the Gini index is referred to as a measure of node purity - a small value indicates that a node contains predominantly observations from a single class
An alternative to the Gini index is cross-entropy given by
D=−K∑k=1ˆpmklog(ˆpmk)
- The Gini index and cross-entropy are very similar numerically