16.4 Model-specific implementation
By reviewing the documentation of the vip::vi_model
functions we can see some descriptions some methods work, some of the use permutation (sampling without replacement) to validate importance.
Source package | Method Description |
---|---|
lm | It uses the absolute value of the \(t\)–statistic as a measure of feature importance |
glmnet | It uses the absolute value of the coefficients are returned for a specific model, so it’s important to standardized the features prior fitting the model. We can specify which coefficients to return by passing the specific value of the penalty parameter via the lambda |
earth | 1. type = "nsubsets" counts the number of model subsets that include each feature. Variables that are included in more subsets are considered more important. 2. type = "rss" calculates the decrease in the RSS for each subset relative to the previous subset during earth()’s backward pass. Then it sums these decreases over all subsets that include each variable and scale the result to report 100 for the variable associated with the larger net decrease. 3. type = "gcv" which is a shortcut for leave-one-out cross validation and follow the same strategy as the rss with the difference that a variable might have a negative total importance as this measure no always decrease. |
rpart | It records the sum of the goodness of split for each split and variable plus “goodness”* for all splits in which it was a surrogate. |
randomForest | 1. type = 1 (mean decrease in accuracy) records the prediction out-of-bag error rate or MSE for each tree, then repeats the process after permuting each predictor variable, calculate the difference between both errors, takes the average over all trees and normalized the results using the standard deviation of the differences. 2. type = 2 (mean decrease in node impurity) is the total decrease in node impurities (Gini index or residual sum of squares) from splitting on the variable, averaged over all trees. |
gbm | 1. If type = "relative.influence" and distribution = "gaussian" this returns the reduction of squared error attributable to each variable. 2. If distribution != "gaussian" returns the reduction attributable to each variable in sum of squared error in predicting the gradient on each iteration to report the relative influence of each variable in reducing the loss function. 3. If type = "permutation" it randomly permutes each predictor variable at a time and computes the associated reduction in predictive performance using using the entire training dataset. |
xgboost | 1. For linear models, the importance is the absolute magnitude of linear coefficients. 2. For tree models and type = "gain" it gives the fractional contribution of each feature to the model based on the total gain (sum of the improvements in accuracy brought by that feature to all the branches it is on.) of the corresponding feature’s splits. 3. For tree models and type = "cover" it gives the number of observations related to each feature. 4. For tree models and type = "frequency" it gives the percentages representing the relative number of times each feature has been used throughout each tree in the ensemble. |