16.5 Principal Component Analysis (PCA)
- Unsupervised method: acts on the data without any regard for the outcome
- Finds features that try to account for as much variation as possible in the original data
%>%
bean_rec_trained step_pca(all_numeric_predictors(), num_comp = 4) %>%
plot_validation_results() +
ggtitle("Principal Component Analysis")
We can see the first two components separate the classes well. How do they do this?
library(learntidymodels)
%>%
bean_rec_trained step_pca(all_numeric_predictors(), num_comp = 4) %>%
prep() %>%
plot_top_loadings(component_number <= 4, n = 5) +
scale_fill_brewer(palette = "Paired") +
ggtitle("Principal Component Analysis")
The predictors contributing to PC1 are all related to size, while PC2 relates to measures of elongation.