3.1 Missing values treatment

Before we start deleting or imputing missing data, it is advisable to investigate the meaning or cause of the missing values. In the AMES housing data, the creators provide extensive documentation explaining in most cases why variables have missing values.

For example, the variable Garage Type missing values identify housing units without garages. The textbook confirms this insight comparing Garage Type with Garage Cars (size of garage in car capacity) and Garage Area (garage total area in square feet).

ames_raw %>% 
     filter(is.na(`Garage Type`)) %>% 
     select(`Garage Type`, `Garage Cars`, `Garage Area`)
## # A tibble: 157 × 3
##    `Garage Type` `Garage Cars` `Garage Area`
##    <chr>                 <int>         <int>
##  1 <NA>                      0             0
##  2 <NA>                      0             0
##  3 <NA>                      0             0
##  4 <NA>                      0             0
##  5 <NA>                      0             0
##  6 <NA>                      0             0
##  7 <NA>                      0             0
##  8 <NA>                      0             0
##  9 <NA>                      0             0
## 10 <NA>                      0             0
## # ℹ 147 more rows

The logical conclusion is that we can impute Garage Type missing values with a label ‘none’ or ‘no garage’.