Random Forest: Residuals \(r_i\) in function of observed values

The random forest model, as the linear-regression model, assumes that residuals should be homoscedastic, i.e., that they should have a constant variance.

The plot suggests that the predictions are shifted (biased) towards the average.

  • For large observed the residuals are positive.
  • For small observed the residuals are negative.
md_rf <- model_diagnostics(explain_apart_rf)

plot(md_rf, variable = "y", yvariable = "residuals") 
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

For models like linear regression, such heteroscedasticity of the residuals would be worrying. In random forest models, however, it may be less of concern.