15.5 Automated machine learning
It involves performing an automated search across multiple base learners and then stack the resulting models.
Functionality | Commercial Products | Open Source Solutions |
---|---|---|
Feature Engineering | Yes | Limited |
Model Selection | Yes | Yes |
Hyperparameter Optimization | Yes | Yes |
Model Validation Procedures | Yes | Limited |
Comparison of Model Performance | Yes | Yes |
The AutoML provides us direction for further analysis, as it can explore which models fits better with data as we use that time to perform other tasks.
# Use AutoML to find a list of candidate models (i.e., leaderboard)
<- h2o.automl(
auto_ml x = X,
y = Y,
training_frame = train_h2o,
nfolds = 5,
max_runtime_secs = 60 * 120, # 2 hour limit
max_models = 50,
keep_cross_validation_predictions = TRUE,
sort_metric = "RMSE",
seed = 123,
stopping_rounds = 50,
stopping_metric = "RMSE",
stopping_tolerance = 0
)
# Assess the leader board; the following truncates the results to show the top
# and bottom 15 models. You can get the top model with auto_ml@leader
@leaderboard %>%
auto_mlas.data.frame() %>%
subset(select = c(model_id, rmse)) %>%
rbind(head(., 15L),
tail(., 15L))
## model_id rmse
## 1 XGBoost_1_AutoML_20190220_084553 22229.97
## 2 GBM_grid_1_AutoML_20190220_084553_model_1 22437.26
## 3 GBM_grid_1_AutoML_20190220_084553_model_3 22777.57
## 4 GBM_2_AutoML_20190220_084553 22785.60
## 5 GBM_3_AutoML_20190220_084553 23133.59
## 6 GBM_4_AutoML_20190220_084553 23185.45
## 7 XGBoost_2_AutoML_20190220_084553 23199.68
## 8 XGBoost_1_AutoML_20190220_075753 23231.28
## 9 GBM_1_AutoML_20190220_084553 23326.57
## 10 GBM_grid_1_AutoML_20190220_075753_model_2 23330.42
## 11 XGBoost_3_AutoML_20190220_084553 23475.23
## 12 XGBoost_grid_1_AutoML_20190220_084553_model_3 23550.04
## 13 XGBoost_grid_1_AutoML_20190220_075753_model_15 23640.95
## 14 XGBoost_grid_1_AutoML_20190220_084553_model_8 23646.66
## 15 XGBoost_grid_1_AutoML_20190220_084553_model_6 23682.37
## ... ... ...
## 65 GBM_grid_1_AutoML_20190220_084553_model_5 33971.32
## 66 GBM_grid_1_AutoML_20190220_075753_model_8 34489.39
## 67 DeepLearning_grid_1_AutoML_20190220_084553_model_3 36591.73
## 68 GBM_grid_1_AutoML_20190220_075753_model_6 36667.56
## 69 XGBoost_grid_1_AutoML_20190220_084553_model_13 40416.32
## 70 GBM_grid_1_AutoML_20190220_075753_model_9 47744.43
## 71 StackedEnsemble_AllModels_AutoML_20190220_084553 49856.66
## 72 StackedEnsemble_AllModels_AutoML_20190220_075753 59127.09
## 73 StackedEnsemble_BestOfFamily_AutoML_20190220_084553 76714.90
## 74 StackedEnsemble_BestOfFamily_AutoML_20190220_075753 76748.40
## 75 GBM_grid_1_AutoML_20190220_075753_model_5 78465.26
## 76 GBM_grid_1_AutoML_20190220_075753_model_3 78535.34
## 77 GLM_grid_1_AutoML_20190220_075753_model_1 80284.34
## 78 GLM_grid_1_AutoML_20190220_084553_model_1 80284.34
## 79 XGBoost_grid_1_AutoML_20190220_075753_model_4 92559.44
## 80 XGBoost_grid_1_AutoML_20190220_075753_model_10 125384.88