Feature Importance

This section references Section 8.1.1 of Christopher Molnar’s Interpretable Machine Learning book

Can measure partial dependence-based feature importance as follows:

I(xS)=1K1Kk=1(ˆfS(x(k)S)1KKk=1ˆfS(x(k)S))2 where x(k)S are K unique values of feature XS

Formula calculates variation of PD profile values around average PD value.

Main idea: A flat PD profile indicates a feature that is not important.

Limitations:

  • Only captures main effects, ignores feature interactions
  • Defined over unique values over the explanatory variable. A unique feature with just one instance is given equal weight to a value with many instance.