5.5 Imbalanced Data
- Undersample (aka downsample): Use fewer of the prevalent class (throw away data).
- Oversample (aka upsample): Bootstrap copies of the rare class.
- Up weighting and down weighting can do the ~same thing.
- Data generation (SMOTE) can be helpful to create cases similar to the rare class.