1.17 Missing Data

There are three basic approaches to dealing with missing data: feature selection, listwise deletion, and imputation.

  • In feature selection, you delete variables (columns) that contain too many missing values.

  • Listwise deletion involves deleting observations (rows) that contain missing values on any of the variables of interest.

  • Imputation involves replacing missing values with “reasonable” guesses about what the values would have been if they had not been missing. There are several approaches, as detailed in such packages as VIM, mice, Amelia and missForest.