Feature Importance

This section references Section 8.1.1 of Christopher Molnar’s Interpretable Machine Learning book

Can measure partial dependence-based feature importance as follows:

$I(x_S) = \sqrt{\frac{1}{K-1}\sum_{k=1}^K(\hat{f}_S(x^{(k)}_S) - \frac{1}{K}\sum_{k=1}^K \hat{f}_S({x^{(k)}_S))^2}}$ where $x^{(k)}_S$ are K unique values of feature $X_S$

Formula calculates variation of PD profile values around average PD value.

Main idea: A flat PD profile indicates a feature that is not important.

Limitations:

Only captures main effects, ignores feature interactions
Defined over unique values over the explanatory variable. A unique feature with just one instance is given equal weight to a value with many instance.