19.1 Equivocal Results
What the heck is this? Loosely, equivacol results are a range of results indicating that the prediction should not be reported.
Let’s take a soccer example. (Data courtesy of 538.)
<-
df ::here('data', 'spi_matches_latest.csv') %>%
hereread_csv() %>%
drop_na(score1) %>%
mutate(w1 = ifelse(score1 > score2, 'yes', 'no') %>% factor()) %>%
select(
matches('(team|spi|prob|importance)[12]'), probtie
w1, season, %>%
) # we need this as feature, so it can't be NA
drop_na(importance1)
# much more data in 2021 season than in older seasons, so use older seasons as test set
<- df %>% filter(season == 2021) %>% select(-season)
trn <- df %>% filter(season != 2021) %>% select(-season)
tst %>% count(w1) trn
## # A tibble: 2 × 2
## w1 n
## <fct> <int>
## 1 no 1376
## 2 yes 1054
%>% count(w1) tst
## # A tibble: 2 × 2
## w1 n
## <fct> <int>
## 1 no 248
## 2 yes 227
<-
fit logistic_reg() %>%
set_engine('glm') %>%
# 2 variables cuz dumb humans like 2-D plots
fit(w1 ~ spi1 + importance1, data = trn)
%>% tidy() fit
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.742 0.107 -6.92 4.50e-12
## 2 spi1 0.0100 0.00259 3.86 1.13e- 4
## 3 importance1 0.00381 0.00250 1.52 1.27e- 1
<- function(fit, set) {
predict_stuff bind_cols(
%>%
fit predict(set),
%>%
fit predict(set, type = 'prob'),
%>%
fit predict(set, type = 'conf_int', std_error = TRUE),
set
)
}
<- fit %>% predict_stuff(tst) preds_tst
How’s the model accuracy?
%>%
preds_tst accuracy(estimate = .pred_class, truth = w1)
## # A tibble: 1 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.579
Seems reasonable… so maybe modeling soccer match outcomes with 2 variables ain’t so bad, eh?
Observe the fitted class boundary. (Confidence intervals shown instead of prediction intervals because I didn’t want to use stan… Max please forgive me.)
Use the {probably}
package!
<- levels(preds_tst$w1)
lvls
<-
preds_tst_eqz %>%
preds_tst mutate(.pred_with_eqz = make_two_class_pred(.pred_yes, lvls, buffer = 0.025))
%>% count(.pred_with_eqz) preds_tst_eqz
## # A tibble: 3 × 2
## .pred_with_eqz n
## <clss_prd> <int>
## 1 [EQ] 50
## 2 no 48
## 3 yes 377
Look at how make_two_class_pred
changes our confusion matrix.
# All data
%>% conf_mat(w1, .pred_class) %>% autoplot('heatmap') preds_tst_eqz
# Reportable results only
%>% conf_mat(w1, .pred_with_eqz) %>% autoplot('heatmap') preds_tst_eqz
Does the equivocal zone help improve accuracy? How sensitive is accuracy and our reportable rate to the width of the buffer?
<- function(buffer) {
eq_zone_results <-
preds_tst_eqz %>%
preds_tst mutate(.pred_with_eqz = make_two_class_pred(.pred_no, lvls, buffer = buffer))
<- preds_tst_eqz %>% accuracy(w1, .pred_with_eqz)
acc <- reportable_rate(preds_tst_eqz$.pred_with_eqz)
rep_rate tibble(accuracy = acc$.estimate, reportable = rep_rate, buffer = buffer)
}
map_dfr(seq(0, 0.15, length.out = 40), eq_zone_results) %>%
pivot_longer(c(-buffer)) %>%
ggplot(aes(x = buffer, y = value, col = name)) +
geom_step(size = 1.2, alpha = 0.8) +
labs(y = NULL)
How does the standard error look across the feature space
Makes sense… we’re more uncertain for cases outside of the normal boundary of our data.