7.4 Summary

This chapter focussed on ML techniques as a holistic analysis pipeline, not on individual ML algorithms or methods. Best practices are summarised at the end of the chapter:

  • Conduct exploratory data analysis to understand the underlying structure of the data and relationships between variables.
  • Apply feature engineering techniques to create new variables and enhance the model’s predictive power.
  • Select machine learning models that are contextually appropriate and robust for public health data analysis. Such as Random Forest, Generalised Linear Models, and others.
  • Use parameter calibration techniques such as cross-validation, regularisation, monte carlo, and grid search to optimise model performance.
  • Evaluate model performance using appropriate metrics and visualisation tools to assess predictive accuracy and relevance.

TLDR: It’s not just about applying the individual ML model but about considering the goals, dataset, preprocessing, calibration and evaluation of the model.