9.5 Code Examples in R

Package options

lime is port of Python library, discretizes variables based on quartiles
localModel uses ceteris-paribus profiles,
iml works directly on continuous variables. One of package authors wrote popular interpretable ML book

Notes:

DALExtra package needed for predict_surrogate() interface to multiple packages.
Default method ofpredict_surrogate() is localModel.
Examples below use K-LASSO for glass-box model

# core libraries
library(randomForest)
library(DALEX)
library(DALEXtra)

# load data nd models 
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_rf <- archivist:: aread("pbiecek/models/4e0fc")
johnny_d <- archivist:: aread("pbiecek/models/e3596")  

# explainer
titanic_rf_exp <- DALEX::explain(model = titanic_rf,  
                        data = titanic_imputed[, -9],
                           y = titanic_imputed$survived == "yes", 
                       label = "Random Forest")

## Preparation of a new explainer is initiated
##   -> model label       :  Random Forest 
##   -> data              :  2207  rows  8  cols 
##   -> target variable   :  2207  values 
##   -> predict function  :  yhat.randomForest  will be used (  default  )
##   -> predicted values  :  No value for predict function target column. (  default  )
##   -> model_info        :  package randomForest , ver. 4.7.1.1 , task classification (  default  ) 
##   -> model_info        :  Model info detected classification task but 'y' is a logical . Converted to numeric.  (  NOTE  )
##   -> predicted values  :  numerical, min =  0 , mean =  0.2353095 , max =  1  
##   -> residual function :  difference between y and yhat (  default  )
##   -> residuals         :  numerical, min =  -0.892 , mean =  0.0868473 , max =  1  
##   A new explainer has been created!

Package: lime

Fit model:

library(lime)
set.seed(1)

# lime model
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer
predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer

lime_johnny <- predict_surrogate(explainer = titanic_rf_exp, 
                  new_observation = johnny_d, 
                  n_features = 3, 
                  n_permutations = 1000,
                  type = "lime")

model_type	case	model_r2	model_intercept	model_prediction	feature	feature_value	feature_weight	feature_desc	data	prediction
regression	1	0.613	0.557	0.481	gender	2	-0.395	gender = male	2, 8, 1, 4, 72, 0, 0	0.422
regression	1	0.613	0.557	0.481	age	8	0.173	age <= 22	2, 8, 1, 4, 72, 0, 0	0.422
regression	1	0.613	0.557	0.481	class	1	0.146	class = 1st	2, 8, 1, 4, 72, 0, 0	0.422

Interpretable equation:

$\hat p_{lime} = 0.557 - 0.395 \cdot 1_{male} + 0.173 \cdot 1_{age <= 22} + 0.146 \cdot 1_{class = 1st}=0.481,$ Plot lime model:

plot(lime_johnny)

Package: localModel

library(localModel)

# localModel build
locMod_johnny <- predict_surrogate(explainer = titanic_rf_exp, 
                  new_observation = johnny_d, 
                  size = 1000,
                  seed = 1,
                  type = "localModel")

estimated	variable	original_variable
0.235	(Model mean)
0.619	(Intercept)
-0.402	gender = male	gender
0.120	age <= 15.36	age
0.156	class = 1st, 2nd, deck crew	class
-0.003	embarked = Belfast, Southampton	embarked

Plot to explain how continuous age variable was dichotomized:

# plot
plot_interpretable_feature(locMod_johnny, "age")

Glass-box explanation plot for Johnny D:

plot(locMod_johnny)

Package: iml

library(iml)

# model using iml pacakge
iml_johnny <- predict_surrogate(explainer = titanic_rf_exp, 
                  new_observation = johnny_d, 
                  k = 3, 
                  type = "iml",
                  seed=1)

beta	x.recoded	effect	x.original	feature	.class
0.199	1	0.199	1st	class=1st	yes
-1.601	1	-1.601	male	gender=male	yes
0.000	72	0.015	72	fare	yes

Notes:

continuous variables are not transformed
categorical variables dichotomized with value 1 for observed category; otherwise 0

Glass-box explanation plot for Johnny D:

plot(iml_johnny)

The age, gender and class are correlated, and may partially explain why explanations are somewhat different across various LIME implementations.