5.5 Imbalanced Data

  • Undersample (aka downsample): Use fewer of the prevalent class (throw away data).
  • Oversample (aka upsample): Bootstrap copies of the rare class.
  • Up weighting and down weighting can do the ~same thing.
  • Data generation (SMOTE) can be helpful to create cases similar to the rare class.