9.1 Performance Metrics and Inference
In this chapter, we will be talking about the qualities of a model, while applying several functions from the yardstick package.
library(tidyverse)
library(tidymodels)
tidymodels_prefer()
library(DiagrammeR)
library(viridis)
This package focuses on methods of resampling that are critical to modeling activities, such as performance measures and performance metrics.
yardstick: Tidy Characterizations of Model Performance
Identification of the quality of a model:
The main takeaway of this chapter is Judging the model effectiveness, or identification of the effectiveness of the modeling procedures.
Constraints may arise when the model uses different units for measuring the differences between observed and predicted values.
In particular, transformations can be applied to standardize observed values so that they can be used in the model interchangeably. Somehow if a transformation is already in place within some variables in the observed data, it will be important to identify the type of transformation applied in order to proceed with the model specification correctly.
It is even for this reason that the use of model metrics is very important. The metrics are able to summarize the results of a model.
There are different types of metrics that can be used to summarize the results of a model fit, depending on the type of response variable whether is numeric or categorical, and so if a regression or classification modeling procedure is performed.
We can use:
- the Root Mean Squared Error (RMSE), a performance metric used in regression modeling.
- the Accuracy, to estimate the model error
- the ROC and AUC, the receiver observation curve and the area under the curve, respectively, if we perform a classification modeling. This curve is calculated combining the Specificity and Sensitivity of the model.