Feature Importance
This section references Section 8.1.1 of Christopher Molnar’s Interpretable Machine Learning book
Can measure partial dependence-based feature importance as follows:
I(xS)=√1K−1K∑k=1(ˆfS(x(k)S)−1KK∑k=1ˆfS(x(k)S))2 where x(k)S are K unique values of feature XS
Formula calculates variation of PD profile values around average PD value.
Main idea: A flat PD profile indicates a feature that is not important.
Limitations:
- Only captures main effects, ignores feature interactions
- Defined over unique values over the explanatory variable. A unique feature with just one instance is given equal weight to a value with many instance.