3.1 Missing values treatment
Before we start deleting or imputing missing data, it is advisable to investigate the meaning or cause of the missing values. In the AMES housing data, the creators provide extensive documentation explaining in most cases why variables have missing values.
For example, the variable Garage Type
missing values identify housing units without garages. The textbook confirms this insight comparing Garage Type
with Garage Cars
(size of garage in car capacity) and Garage Area
(garage total area in square feet).
%>%
ames_raw filter(is.na(`Garage Type`)) %>%
select(`Garage Type`, `Garage Cars`, `Garage Area`)
## # A tibble: 157 × 3
## `Garage Type` `Garage Cars` `Garage Area`
## <chr> <int> <int>
## 1 <NA> 0 0
## 2 <NA> 0 0
## 3 <NA> 0 0
## 4 <NA> 0 0
## 5 <NA> 0 0
## 6 <NA> 0 0
## 7 <NA> 0 0
## 8 <NA> 0 0
## 9 <NA> 0 0
## 10 <NA> 0 0
## # ℹ 147 more rows
The logical conclusion is that we can impute Garage Type
missing values with a label ‘none’ or ‘no garage’.