21.1 Dataset used for demonstrating inference

I will use TidyTuesday dataset on ultra trail running races.

The data comes from Benjamin Nowak by way of International Trail Running Association (ITRA). Their original repo is available on GitHub.

race <- read_csv("data/21_race.csv", show_col_types = FALSE)
ranking <- read_csv("data/21_ultra_rankings.csv", show_col_types = FALSE)

best_results <- ranking %>%
  filter(rank <= 10) %>%
  group_by(race_year_id) %>%
  summarise(time_in_seconds = mean(time_in_seconds), top_10 = n()) %>%
  filter(top_10 == 10) %>%
  select(-top_10)

race_top_results <- race %>%
  filter(participation == "solo" || participation == "Solo") %>%
  inner_join(best_results, by = "race_year_id") %>%
  mutate(avg_elevation_gain = elevation_gain / distance, avg_velocity = distance / time_in_seconds * 3600) %>%
  filter(distance > 0)

glimpse(race_top_results)

## Rows: 976
## Columns: 16
## $ race_year_id       <dbl> 68140, 72496, 69855, 67856, 70469, 66887, 67851, 68…
## $ event              <chr> "Peak District Ultras", "UTMB®", "Grand Raid des Py…
## $ race               <chr> "Millstone 100", "UTMB®", "Ultra Tour 160", "PERSEN…
## $ city               <chr> "Castleton", "Chamonix", "vielle-Aure", "Asenovgrad…
## $ country            <chr> "United Kingdom", "France", "France", "Bulgaria", "…
## $ date               <date> 2021-09-03, 2021-08-27, 2021-08-20, 2021-08-20, 20…
## $ start_time         <time> 19:00:00, 17:00:00, 05:00:00, 18:00:00, 18:00:00, …
## $ participation      <chr> "solo", "Solo", "solo", "solo", "solo", "solo", "so…
## $ distance           <dbl> 166.9, 170.7, 167.0, 164.0, 159.9, 159.9, 163.8, 16…
## $ elevation_gain     <dbl> 4520, 9930, 9980, 7490, 100, 9850, 5460, 4630, 6410…
## $ elevation_loss     <dbl> -4520, -9930, -9980, -7500, -100, -9850, -5460, -46…
## $ aid_stations       <dbl> 10, 11, 13, 13, 12, 15, 5, 8, 13, 13, 12, 15, 0, 14…
## $ participants       <dbl> 150, 2300, 600, 150, 0, 300, 0, 200, 120, 300, 100,…
## $ time_in_seconds    <dbl> 113693.8, 79380.9, 103033.1, 90816.6, 79882.1, 1088…
## $ avg_elevation_gain <dbl> 27.0820851, 58.1722320, 59.7604790, 45.6707317, 0.6…
## $ avg_velocity       <dbl> 5.284721, 7.741409, 5.835018, 6.501014, 7.206120, 5…

We will work with races with non-0 distance, solo participation and at least 10 participants. For each race, the avg velocity is calculated from the velocity of the top 10 racers.

set.seed(345129)