3.4 Feature filtering

Zero and near-zero variance variables are low-hanging fruit to eliminate.

  • Zero variance - feature only contains a single unique value (no predictive power)

  • Near zero variance - feature with near zero variance offer very little, if any, information to a model.

caret::nearZeroVar(ames_train, saveMetrics = TRUE) %>% 
     tibble::rownames_to_column() %>% 
     filter(nzv)
##               rowname  freqRatio percentUnique zeroVar  nzv
## 1              Street  226.66667    0.09760859   FALSE TRUE
## 2               Alley   24.25316    0.14641288   FALSE TRUE
## 3        Land_Contour   19.50000    0.19521718   FALSE TRUE
## 4           Utilities 1023.00000    0.14641288   FALSE TRUE
## 5          Land_Slope   22.15909    0.14641288   FALSE TRUE
## 6         Condition_2  202.60000    0.34163006   FALSE TRUE
## 7           Roof_Matl  144.35714    0.39043436   FALSE TRUE
## 8           Bsmt_Cond   20.24444    0.29282577   FALSE TRUE
## 9      BsmtFin_Type_2   25.85294    0.34163006   FALSE TRUE
## 10       BsmtFin_SF_2  453.25000    9.37042460   FALSE TRUE
## 11            Heating  106.00000    0.29282577   FALSE TRUE
## 12    Low_Qual_Fin_SF 1010.50000    1.31771596   FALSE TRUE
## 13      Kitchen_AbvGr   21.23913    0.19521718   FALSE TRUE
## 14         Functional   38.89796    0.39043436   FALSE TRUE
## 15     Enclosed_Porch  102.05882    7.41825281   FALSE TRUE
## 16 Three_season_porch  673.66667    1.12249878   FALSE TRUE
## 17       Screen_Porch  169.90909    4.63640800   FALSE TRUE
## 18          Pool_Area 2039.00000    0.53684724   FALSE TRUE
## 19            Pool_QC  509.75000    0.24402147   FALSE TRUE
## 20       Misc_Feature   34.18966    0.24402147   FALSE TRUE
## 21           Misc_Val  180.54545    1.56173743   FALSE TRUE

Other feature filtering methods exists. For example:

  • Filter methods (ex. zv, nzv, correlation)

  • Wrapper methods (ex. forward selection, backward elimination, RFE)

  • Embedded methods (ex. Lasso - L1, Ridge - L2)

Source: https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feature-selection-methods-with-an-example-or-how-to-select-the-right-variables/