Capítulo 8 Feature engineering with recipes
Learning objectives:
- Define feature engineering.
- List reasons that feature engineering might be beneficial.
- Use the {recipes} package to create a simple feature engineering recipe.
- Use selectors from the {recipes} package to apply transformations to specific types of columns.
- List some advantages of using a recipe for feature engineering.
- Describe what happens when a recipe is prepared with
recipes::prep(). - Use
recipes::bake()to process a dataset. - Recognize how to use
recipes::step_unknown(),recipes::step_novel(),recipes::step_other()to prepare factor variables. - Explain how
recipes::step_dummy()encodes qualitative data in a numeric format. - Recognize techniques for dealing with large numbers of categories, such as feature hashing or encoding using the {embed} package (as described in this talk by Alan Feder at rstudio::global(2021)).
- Recognize methods for encoding ordered factors.
- Use
recipes::step_interact()to add interaction terms to a recipe. - Understand why some steps might only be applicable to training data.
- Recognize the functions from
{recipes}and{themis}that are only applied to training data by default. - Recognize that
{recipes}includes functions for creating spline terms, such asstep_ns(). - Recognize that
{recipes}includes functions for feature extraction, such asstep_pca(). - Use
themis::step_downsample()to downsample data. - Recognize other row-sampling steps from the
{recipes}package. - Use
recipes::step_mutate()andrecipes::step_mutate_at()for general{dplyr}-like transformations. - Recall that the
{textrecipes}package exists for text-specific feature-engineering steps. - Understand that the functions of the
{recipes}package use training data for all preprocessing and feature engineering steps to prevent leakage. - Use
{recipes}to prepare data for traditional modeling functions. - Use
tidy()to examine a recipe and its steps. - Refer to columns with roles other than
"predictor"or"outcome".