16.5 Principal Component Analysis (PCA)

  • Unsupervised method: acts on the data without any regard for the outcome
  • Finds features that try to account for as much variation as possible in the original data
bean_rec_trained %>%
  step_pca(all_numeric_predictors(), num_comp = 4) %>%
  plot_validation_results() + 
  ggtitle("Principal Component Analysis")

We can see the first two components separate the classes well. How do they do this?

library(learntidymodels)
bean_rec_trained %>%
  step_pca(all_numeric_predictors(), num_comp = 4) %>% 
  prep() %>% 
  plot_top_loadings(component_number <= 4, n = 5) + 
  scale_fill_brewer(palette = "Paired") +
  ggtitle("Principal Component Analysis")

The predictors contributing to PC1 are all related to size, while PC2 relates to measures of elongation.