1.4 The data analysis process
- Cleaning the data: investigate the data to make sure that they are applicable to the project goals, accurate, and appropriate
- Understanding the data: often referred to as exploratory data analysis (EDA). EDA brings to light how the different variables are related to one another, their distributions, typical ranges, and other attributes.
- “How did I come by these data?”
- “Is the data relevant?”
- Develop clear expectations of the goal of your model and how performance will be judged (Chapter 9)
- “What is/are the performance metrics or realistic goal/s of what can be achieved?”