17.1 Effect of Encoding

We use the {embed} and {textrecipes} packages for transformation of the categorical data to a numeric version.

Tree based models and Naive Bayes models deal with categorical data making the encoding.

Methods for encoding categorical variables into numerical can be done by applying polynomial transformations.

In tidymodels there are some step_functions such as:

  • step_unorder()
  • step_ordinalscore()

used for assigning to each order in the categorical vector a specific numerical value.

Categorical variables can be ordered and unordered, when in presence of a high number of categories, fundamental is the categorization of the levels and this can be challenging, for the result of predictions. In particular, issues arise when infinite values, invalid values, NA, too many categorical levels, rare categorical levels, or new categorical levels, are the values we want to encode. 3