1.2 Good Practice guidelines
There are some vital steps to take to modeling:
- knowledge of the process to model
- collect appropriate data
- understand variation in the response
- select relevant predictors
- utilize a range of models
All of these are not enough when model lacks on performance.
The answer might be the in the way the predictors are presented to the model.
1.2.1 What is feature engineering
“…best re-representation of the predictors to improve model performance.” (ct. Preface)
What are the possible ways to acheive a better performance?
- transform the predictors with special functions (log/exp)
- add an interaction term (prod/ratio)
- add a functional transformation (splines/poly)
- add a re-representation of the predictors (mean/med/standardz)
- imputing missing values (knn/bagging)
Disclaimer: Risk of Overfitting!
1.2.2 Nature of modeling
The estimation of uncertainty/noise is another very important step to take.
“If a model is only 50% accurate should it be used to make inferences or predictions?”
The trade-off between accuracy and interpretability is important, a neural network model might be less explicable but can provide a higher level of accuracy.
Feature engineering is a matter of choice in finding the most suitable variable transformation for the best performance.
More considerations about bad model reactions to:
- multicollinarity or correlation between predictors
- missing values
- irrelevant predictors