1.4 The data analysis process

  1. Cleaning the data: investigate the data to make sure that they are applicable to the project goals, accurate, and appropriate
  2. Understanding the data: often referred to as exploratory data analysis (EDA). EDA brings to light how the different variables are related to one another, their distributions, typical ranges, and other attributes.
    • “How did I come by these data?”
    • “Is the data relevant?”
  3. Develop clear expectations of the goal of your model and how performance will be judged (Chapter 9)
    • “What is/are the performance metrics or realistic goal/s of what can be achieved?”

The data science process (from R for Data Science by Wickham and Grolemund.