Bi-Tempered Loss
R Code
penguin_class_df |>
ggplot(aes(x = flipper_length_mm, y = bill_length_mm,
color = chinstrap_bool)) +
geom_point(size = 3) +
geom_abline(intercept = boundary_intercept,
slope = boundary_slope,
color = adelie_color,
linewidth = 2,
linetype = 2) +
geom_segment(aes(x = 208, y = 37, xend = 203, yend = 42),
arrow = arrow(length = unit(0.5, "cm")),
color = gentoo_color,
linewidth = 2) +
labs(title = "<span style = 'color:#067476'>Far Misclassification</span>",
subtitle = "Finding the <span style = 'color:#c65ccc'>Chinstrap</span> penguins",
caption = "Data Science Learning Community") +
scale_color_manual(values = c("gray70", chinstrap_color)) +
theme_minimal() +
theme(plot.title = element_markdown(face = "bold", size = 24),
plot.subtitle = element_markdown(size = 16))
With one-hot encoding and mass on class c, the tempered cross entropy loss is
L(c,ˆy)=11−t1(1−y1−t1c)−12−t1(1−C∑c′=1ˆy2−t1c)
- 0≤t1<1
- As t1→1.0, this reverts back to the log function and standard cross entropy
R Code
penguin_class_df |>
ggplot(aes(x = flipper_length_mm, y = bill_length_mm,
color = chinstrap_bool)) +
geom_point(size = 3) +
geom_abline(intercept = boundary_intercept,
slope = boundary_slope,
color = adelie_color,
linewidth = 2,
linetype = 2) +
geom_segment(aes(x = 208.5, y = 43, xend = 203.5, yend = 48),
arrow = arrow(length = unit(0.5, "cm")),
color = gentoo_color,
linewidth = 2) +
labs(title = "<span style = 'color:#067476'>Close Misclassification</span>",
subtitle = "Finding the <span style = 'color:#c65ccc'>Chinstrap</span> penguins",
caption = "Data Science Learning Community") +
scale_color_manual(values = c("gray70", chinstrap_color)) +
theme_minimal() +
theme(plot.title = element_markdown(face = "bold", size = 24),
plot.subtitle = element_markdown(size = 16))
The tempered softmax is
ˆyc=[1+(1−t2)(ac−λt2(a))]1/(1−t2)
- 0≤t1<1<t2
With the additional constraint C∑c=1ˆyc=1, we can approximate λ with fixed-point iteration (Algorithm 10.2).