Code Snippets in R

We use the titanic dataset and a random forest model.

library("DALEX")
library("randomForest")
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_rf <- archivist::aread("pbiecek/models/4e0fc")
explainer_rf <- DALEX::explain(model = titanic_rf,  
                               data = titanic_imputed[, -9],
                               y = titanic_imputed$survived, 
                               label = "Random Forest")

Partial-dependence profiles

library("ggplot2")
pdp_rf <- model_profile(explainer = explainer_rf, variables = "age")
plot(pdp_rf) +  ggtitle("Partial-dependence profile for age")

Only need to supplier explainer and variable arguments
Optional argument N allows you to vary the sample size used for calculation, default is 100
We can specify specific grouping variables if creating grouped PDP
We can also create clustered PDP by specifying the k argument for number of clusters. Uses hierarchical clustering under the hood.

We can include CP profiles (i.e. an ICE plot) with an additional argument to plot():

plot(pdp_rf, geom = "profiles") + 
    ggtitle("Ceteris-paribus and partial-dependence profiles for age")

Clustered partial-dependence profiles

This uses hclust() function:

pdp_rf_clust <- model_profile(explainer = explainer_rf, 
                              variables = "age", k = 3)

plot(pdp_rf_clust, geom = "profiles") + 
    ggtitle("Clustered partial-dependence profiles for age")

Grouped partial dependence profiles {-}

Below we group by gender:

pdp_rf_gender <- model_profile(explainer = explainer_rf, 
                               variables = "age", groups = "gender")

plot(pdp_rf_gender, geom = "profiles") + 
    ggtitle("Partial-dependence profiles for age, grouped by gender")

Contrastive partial-dependence profiles

library("rms")
titanic_lmr <- archivist::aread("pbiecek/models/58b24")
explainer_lmr <- DALEX::explain(model = titanic_lmr, 
                                data = titanic_imputed[, -9],
                                y = titanic_imputed$survived, 
                                label = "Logistic Regression")

pdp_lmr <- model_profile(explainer = explainer_lmr, variables = "age")
pdp_rf <- model_profile(explainer = explainer_rf, variables = "age")

plot(pdp_rf, pdp_lmr) +
    ggtitle("Partial-dependence profiles for age for two models")