7.4 Summary

This chapter focussed on ML techniques as a holistic analysis pipeline, not on individual ML algorithms or methods. Best practices are summarised at the end of the chapter:

Conduct exploratory data analysis to understand the underlying structure of the data and relationships between variables.
Apply feature engineering techniques to create new variables and enhance the model’s predictive power.
Select machine learning models that are contextually appropriate and robust for public health data analysis. Such as Random Forest, Generalised Linear Models, and others.
Use parameter calibration techniques such as cross-validation, regularisation, monte carlo, and grid search to optimise model performance.
Evaluate model performance using appropriate metrics and visualisation tools to assess predictive accuracy and relevance.

TLDR: It’s not just about applying the individual ML model but about considering the goals, dataset, preprocessing, calibration and evaluation of the model.