15.5 Automated machine learning

It involves performing an automated search across multiple base learners and then stack the resulting models.

Functionality Commercial Products Open Source Solutions
Feature Engineering Yes Limited
Model Selection Yes Yes
Hyperparameter Optimization Yes Yes
Model Validation Procedures Yes Limited
Comparison of Model Performance Yes Yes

The AutoML provides us direction for further analysis, as it can explore which models fits better with data as we use that time to perform other tasks.

# Use AutoML to find a list of candidate models (i.e., leaderboard)
auto_ml <- h2o.automl(
  x = X,
  y = Y,
  training_frame = train_h2o,
  nfolds = 5, 
  max_runtime_secs = 60 * 120, # 2 hour limit
  max_models = 50,
  keep_cross_validation_predictions = TRUE,
  sort_metric = "RMSE",
  seed = 123,
  stopping_rounds = 50,
  stopping_metric = "RMSE",
  stopping_tolerance = 0
)

# Assess the leader board; the following truncates the results to show the top 
# and bottom 15 models. You can get the top model with auto_ml@leader
auto_ml@leaderboard %>% 
  as.data.frame() %>%
  subset(select = c(model_id, rmse)) %>%
  rbind(head(., 15L),
        tail(., 15L))
##                                               model_id   rmse
## 1                     XGBoost_1_AutoML_20190220_084553   22229.97
## 2            GBM_grid_1_AutoML_20190220_084553_model_1   22437.26
## 3            GBM_grid_1_AutoML_20190220_084553_model_3   22777.57
## 4                         GBM_2_AutoML_20190220_084553   22785.60
## 5                         GBM_3_AutoML_20190220_084553   23133.59
## 6                         GBM_4_AutoML_20190220_084553   23185.45
## 7                     XGBoost_2_AutoML_20190220_084553   23199.68
## 8                     XGBoost_1_AutoML_20190220_075753   23231.28
## 9                         GBM_1_AutoML_20190220_084553   23326.57
## 10           GBM_grid_1_AutoML_20190220_075753_model_2   23330.42
## 11                    XGBoost_3_AutoML_20190220_084553   23475.23
## 12       XGBoost_grid_1_AutoML_20190220_084553_model_3   23550.04
## 13      XGBoost_grid_1_AutoML_20190220_075753_model_15   23640.95
## 14       XGBoost_grid_1_AutoML_20190220_084553_model_8   23646.66
## 15       XGBoost_grid_1_AutoML_20190220_084553_model_6   23682.37
## ...                                                ...        ...
## 65           GBM_grid_1_AutoML_20190220_084553_model_5   33971.32
## 66           GBM_grid_1_AutoML_20190220_075753_model_8   34489.39
## 67  DeepLearning_grid_1_AutoML_20190220_084553_model_3   36591.73
## 68           GBM_grid_1_AutoML_20190220_075753_model_6   36667.56
## 69      XGBoost_grid_1_AutoML_20190220_084553_model_13   40416.32
## 70           GBM_grid_1_AutoML_20190220_075753_model_9   47744.43
## 71    StackedEnsemble_AllModels_AutoML_20190220_084553   49856.66
## 72    StackedEnsemble_AllModels_AutoML_20190220_075753   59127.09
## 73 StackedEnsemble_BestOfFamily_AutoML_20190220_084553   76714.90
## 74 StackedEnsemble_BestOfFamily_AutoML_20190220_075753   76748.40
## 75           GBM_grid_1_AutoML_20190220_075753_model_5   78465.26
## 76           GBM_grid_1_AutoML_20190220_075753_model_3   78535.34
## 77           GLM_grid_1_AutoML_20190220_075753_model_1   80284.34
## 78           GLM_grid_1_AutoML_20190220_084553_model_1   80284.34
## 79       XGBoost_grid_1_AutoML_20190220_075753_model_4   92559.44
## 80      XGBoost_grid_1_AutoML_20190220_075753_model_10  125384.88