16.9 Modeling

  • Let’s explore some different models with different dimensionality reduction techniques: single layer neural network, bagged trees, flexible discriminant analysis (FDA), naive Bayes, and regularized discriminant analysis (RDA)

(This is slow so I don’t actually run it here.)

ctrl <- control_grid(parallel_over = "everything")
bean_res <- 
  workflow_set(
    preproc = list(basic = class ~., pls = pls_rec, umap = umap_rec), 
    models = list(bayes = bayes_spec, fda = fda_spec,
                  rda = rda_spec, bag = bagging_pec,
                  mlp = mlp_spec)
  ) %>% 
  workflow_map(
    verbose = TRUE,
    seed = 1703,
    resamples = bean_val,
    grid = 10,
    metrics = metric_set(roc_auc)
  )

rankings <- 
  rank_results(bean_res, select_best = TRUE) %>% 
  mutate(method = map_chr(wflow_id, ~ str_split(.x, "_", simplify = TRUE)[1])) 

rankings %>% 
  ggplot(aes(x = rank, y = mean, pch = method, col = model)) + 
  geom_point(cex = 3) + 
  theme(legend.position = "right") +
  labs(y = "ROC AUC") +
  coord_cartesian(ylim = c(0, 1))

Most models give good performance. Regularized discriminant analysis with PLS seems the best.