Feature Importance

This section references Section 8.1.1 of Christopher Molnar’s Interpretable Machine Learning book

Can measure partial dependence-based feature importance as follows:

\[I(x_S) = \sqrt{\frac{1}{K-1}\sum_{k=1}^K(\hat{f}_S(x^{(k)}_S) - \frac{1}{K}\sum_{k=1}^K \hat{f}_S({x^{(k)}_S))^2}}\] where \(x^{(k)}_S\) are K unique values of feature \(X_S\)

Formula calculates variation of PD profile values around average PD value.

Main idea: A flat PD profile indicates a feature that is not important.

Limitations:

  • Only captures main effects, ignores feature interactions
  • Defined over unique values over the explanatory variable. A unique feature with just one instance is given equal weight to a value with many instance.