Apartment-prices: Residual distribution

  • The distributions of residuals for both models are different.

  • The residuals of random forest

    • They are centered around zero so the predictions are, on average, close to the actual values.
    • The skewness indicates that there are some predictions where the model significantly underestimated the actual values.
  • The residuals of linear-regression:

    • They are splitted into 2 separate normal-like parts, located about -200 and 400, which may suggest the omission of a binary explanatory variable.
  • Random forest residuals seem to be centered at a value closer to zero than the distribution for the linear-regression, but it shows a larger variation.

plot(mr_rf, mr_lm, geom = "histogram") +
  ggplot2::geom_vline(xintercept = 0)