21.1 Dataset used for demonstrating inference
I will use TidyTuesday dataset on ultra trail running races.
The data comes from Benjamin Nowak by way of International Trail Running Association (ITRA). Their original repo is available on GitHub.
<- read_csv("data/21_race.csv", show_col_types = FALSE)
race <- read_csv("data/21_ultra_rankings.csv", show_col_types = FALSE)
ranking
<- ranking %>%
best_results filter(rank <= 10) %>%
group_by(race_year_id) %>%
summarise(time_in_seconds = mean(time_in_seconds), top_10 = n()) %>%
filter(top_10 == 10) %>%
select(-top_10)
<- race %>%
race_top_results filter(participation == "solo" || participation == "Solo") %>%
inner_join(best_results, by = "race_year_id") %>%
mutate(avg_elevation_gain = elevation_gain / distance, avg_velocity = distance / time_in_seconds * 3600) %>%
filter(distance > 0)
glimpse(race_top_results)
## Rows: 976
## Columns: 16
## $ race_year_id <dbl> 68140, 72496, 69855, 67856, 70469, 66887, 67851, 68…
## $ event <chr> "Peak District Ultras", "UTMB®", "Grand Raid des Py…
## $ race <chr> "Millstone 100", "UTMB®", "Ultra Tour 160", "PERSEN…
## $ city <chr> "Castleton", "Chamonix", "vielle-Aure", "Asenovgrad…
## $ country <chr> "United Kingdom", "France", "France", "Bulgaria", "…
## $ date <date> 2021-09-03, 2021-08-27, 2021-08-20, 2021-08-20, 20…
## $ start_time <time> 19:00:00, 17:00:00, 05:00:00, 18:00:00, 18:00:00, …
## $ participation <chr> "solo", "Solo", "solo", "solo", "solo", "solo", "so…
## $ distance <dbl> 166.9, 170.7, 167.0, 164.0, 159.9, 159.9, 163.8, 16…
## $ elevation_gain <dbl> 4520, 9930, 9980, 7490, 100, 9850, 5460, 4630, 6410…
## $ elevation_loss <dbl> -4520, -9930, -9980, -7500, -100, -9850, -5460, -46…
## $ aid_stations <dbl> 10, 11, 13, 13, 12, 15, 5, 8, 13, 13, 12, 15, 0, 14…
## $ participants <dbl> 150, 2300, 600, 150, 0, 300, 0, 200, 120, 300, 100,…
## $ time_in_seconds <dbl> 113693.8, 79380.9, 103033.1, 90816.6, 79882.1, 1088…
## $ avg_elevation_gain <dbl> 27.0820851, 58.1722320, 59.7604790, 45.6707317, 0.6…
## $ avg_velocity <dbl> 5.284721, 7.741409, 5.835018, 6.501014, 7.206120, 5…
We will work with races with non-0 distance, solo participation and at least 10 participants. For each race, the avg velocity is calculated from the velocity of the top 10 racers.
set.seed(345129)