21.2 Tidy method from the {broom} package
- predictable outcome for many different models and statistical tests
- always a tibble
- consistent column names
- most useful for analysing / visualizing multiple models/tests
- easier to combine results (no rownames)
- also used internally by higher level functions in tidymodels packages
- other packages also provide tidy methods for their own data structures
- different models, tests will have different structures based on what makes sense, but use as similar structure as possible
You can get the same outcome from many different input formats.
%>%
race_top_results ggplot(aes(avg_elevation_gain, avg_velocity)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm") +
expand_limits(y = 0)
As makes sense intuitively, higher elevation gain per mile results in lower velocity.
<- linear_reg() %>% set_engine("lm")
lm_spec
<- workflow() %>%
wf add_model(lm_spec) %>%
add_formula(avg_velocity ~ avg_elevation_gain + distance)
<- wf %>%
fitted_wf fit(race_top_results)
%>% tidy() fitted_wf
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 12.6 0.757 16.7 5.13e-55
## 2 avg_elevation_gain -0.0514 0.00247 -20.8 5.13e-80
## 3 distance -0.0243 0.00467 -5.21 2.35e- 7
%>%
lm_spec fit(avg_velocity ~ avg_elevation_gain + distance, data = race_top_results) %>%
tidy()
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 12.6 0.757 16.7 5.13e-55
## 2 avg_elevation_gain -0.0514 0.00247 -20.8 5.13e-80
## 3 distance -0.0243 0.00467 -5.21 2.35e- 7
{broom} existed before {tidymodels}, it works for base R lm model object as well.
%>% extract_fit_engine() %>% tidy() fitted_wf
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 12.6 0.757 16.7 5.13e-55
## 2 avg_elevation_gain -0.0514 0.00247 -20.8 5.13e-80
## 3 distance -0.0243 0.00467 -5.21 2.35e- 7
lm(avg_velocity ~ avg_elevation_gain + distance, data = race_top_results) %>%
tidy()
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 12.6 0.757 16.7 5.13e-55
## 2 avg_elevation_gain -0.0514 0.00247 -20.8 5.13e-80
## 3 distance -0.0243 0.00467 -5.21 2.35e- 7
In addition to models, we can tidy the result of tests such as correlation test or t-test.
cor.test(race_top_results$avg_velocity, race_top_results$avg_elevation_gain) %>%
tidy()
## # A tibble: 1 × 8
## estimate statistic p.value parameter conf.low conf.high method alternative
## <dbl> <dbl> <dbl> <int> <dbl> <dbl> <chr> <chr>
## 1 -0.561 -21.1 9.87e-82 970 -0.603 -0.517 Pearson'… two.sided
t.test(
%>% filter(date > '2015-01-01') %>% pull(avg_velocity),
race_top_results %>% filter(date <= '2015-01-01') %>% pull(avg_velocity)
race_top_results %>%
) tidy()
## # A tibble: 1 × 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.423 6.78 7.21 -4.26 0.0000248 486. -0.618 -0.228
## # ℹ 2 more variables: method <chr>, alternative <chr>