Chapter 8 Feature engineering with recipes
Learning objectives:
- Define feature engineering.
- List reasons that feature engineering might be beneficial.
- Use the {recipes} package to create a simple feature engineering recipe.
- Use selectors from the {recipes} package to apply transformations to specific types of columns.
- List some advantages of using a recipe for feature engineering.
- Describe what happens when a recipe is prepared with
recipes::prep()
. - Use
recipes::bake()
to process a dataset. - Recognize how to use
recipes::step_unknown()
,recipes::step_novel()
,recipes::step_other()
to prepare factor variables. - Explain how
recipes::step_dummy()
encodes qualitative data in a numeric format. - Recognize techniques for dealing with large numbers of categories, such as feature hashing or encoding using the {embed} package (as described in this talk by Alan Feder at rstudio::global(2021)).
- Recognize methods for encoding ordered factors.
- Use
recipes::step_interact()
to add interaction terms to a recipe. - Understand why some steps might only be applicable to training data.
- Recognize the functions from
{recipes}
and{themis}
that are only applied to training data by default. - Recognize that
{recipes}
includes functions for creating spline terms, such asstep_ns()
. - Recognize that
{recipes}
includes functions for feature extraction, such asstep_pca()
. - Use
themis::step_downsample()
to downsample data. - Recognize other row-sampling steps from the
{recipes}
package. - Use
recipes::step_mutate()
andrecipes::step_mutate_at()
for general{dplyr}
-like transformations. - Recall that the
{textrecipes}
package exists for text-specific feature-engineering steps. - Understand that the functions of the
{recipes}
package use training data for all preprocessing and feature engineering steps to prevent leakage. - Use
{recipes}
to prepare data for traditional modeling functions. - Use
tidy()
to examine a recipe and its steps. - Refer to columns with roles other than
"predictor"
or"outcome"
.