7.4 Summary
This chapter focussed on ML techniques as a holistic analysis pipeline, not on individual ML algorithms or methods. Best practices are summarised at the end of the chapter:
- Conduct exploratory data analysis to understand the underlying structure of the data and relationships between variables.
- Apply feature engineering techniques to create new variables and enhance the model’s predictive power.
- Select machine learning models that are contextually appropriate and robust for public health data analysis. Such as Random Forest, Generalised Linear Models, and others.
- Use parameter calibration techniques such as cross-validation, regularisation, monte carlo, and grid search to optimise model performance.
- Evaluate model performance using appropriate metrics and visualisation tools to assess predictive accuracy and relevance.
TLDR: It’s not just about applying the individual ML model but about considering the goals, dataset, preprocessing, calibration and evaluation of the model.