3.4 Feature filtering
Zero and near-zero variance variables are low-hanging fruit to eliminate.
Zero variance - feature only contains a single unique value (no predictive power)
Near zero variance - feature with near zero variance offer very little, if any, information to a model.
::nearZeroVar(ames_train, saveMetrics = TRUE) %>%
caret::rownames_to_column() %>%
tibblefilter(nzv)
## rowname freqRatio percentUnique zeroVar nzv
## 1 Street 226.66667 0.09760859 FALSE TRUE
## 2 Alley 24.25316 0.14641288 FALSE TRUE
## 3 Land_Contour 19.50000 0.19521718 FALSE TRUE
## 4 Utilities 1023.00000 0.14641288 FALSE TRUE
## 5 Land_Slope 22.15909 0.14641288 FALSE TRUE
## 6 Condition_2 202.60000 0.34163006 FALSE TRUE
## 7 Roof_Matl 144.35714 0.39043436 FALSE TRUE
## 8 Bsmt_Cond 20.24444 0.29282577 FALSE TRUE
## 9 BsmtFin_Type_2 25.85294 0.34163006 FALSE TRUE
## 10 BsmtFin_SF_2 453.25000 9.37042460 FALSE TRUE
## 11 Heating 106.00000 0.29282577 FALSE TRUE
## 12 Low_Qual_Fin_SF 1010.50000 1.31771596 FALSE TRUE
## 13 Kitchen_AbvGr 21.23913 0.19521718 FALSE TRUE
## 14 Functional 38.89796 0.39043436 FALSE TRUE
## 15 Enclosed_Porch 102.05882 7.41825281 FALSE TRUE
## 16 Three_season_porch 673.66667 1.12249878 FALSE TRUE
## 17 Screen_Porch 169.90909 4.63640800 FALSE TRUE
## 18 Pool_Area 2039.00000 0.53684724 FALSE TRUE
## 19 Pool_QC 509.75000 0.24402147 FALSE TRUE
## 20 Misc_Feature 34.18966 0.24402147 FALSE TRUE
## 21 Misc_Val 180.54545 1.56173743 FALSE TRUE
Other feature filtering methods exists. For example:
Filter methods (ex. zv, nzv, correlation)
Wrapper methods (ex. forward selection, backward elimination, RFE)
Embedded methods (ex. Lasso - L1, Ridge - L2)