Bi-Tempered Loss

R Code
penguin_class_df |>
ggplot(aes(x = flipper_length_mm, y = bill_length_mm, 
           color = chinstrap_bool)) + 
  geom_point(size = 3) +   
  geom_abline(intercept = boundary_intercept,
              slope = boundary_slope,
              color = adelie_color,
              linewidth = 2,
              linetype = 2) +
  geom_segment(aes(x = 208, y = 37, xend = 203, yend = 42),
               arrow = arrow(length = unit(0.5, "cm")),
               color = gentoo_color,
               linewidth = 2) +
  labs(title = "<span style = 'color:#067476'>Far Misclassification</span>",
       subtitle = "Finding the <span style = 'color:#c65ccc'>Chinstrap</span> penguins",
       caption = "Data Science Learning Community") +
  scale_color_manual(values = c("gray70", chinstrap_color)) +
  theme_minimal() +
  theme(plot.title = element_markdown(face = "bold", size = 24),
        plot.subtitle = element_markdown(size = 16))

With one-hot encoding and mass on class c, the tempered cross entropy loss is

L(c,ˆy)=11t1(1y1t1c)12t1(1Cc=1ˆy2t1c)

  • 0t1<1
  • As t11.0, this reverts back to the log function and standard cross entropy

R Code
penguin_class_df |>
ggplot(aes(x = flipper_length_mm, y = bill_length_mm, 
           color = chinstrap_bool)) + 
  geom_point(size = 3) +   
  geom_abline(intercept = boundary_intercept,
              slope = boundary_slope,
              color = adelie_color,
              linewidth = 2,
              linetype = 2) +
  geom_segment(aes(x = 208.5, y = 43, xend = 203.5, yend = 48),
               arrow = arrow(length = unit(0.5, "cm")),
               color = gentoo_color,
               linewidth = 2) +
  labs(title = "<span style = 'color:#067476'>Close Misclassification</span>",
       subtitle = "Finding the <span style = 'color:#c65ccc'>Chinstrap</span> penguins",
       caption = "Data Science Learning Community") +
  scale_color_manual(values = c("gray70", chinstrap_color)) +
  theme_minimal() +
  theme(plot.title = element_markdown(face = "bold", size = 24),
        plot.subtitle = element_markdown(size = 16))

The tempered softmax is

ˆyc=[1+(1t2)(acλt2(a))]1/(1t2)

  • 0t1<1<t2

With the additional constraint Cc=1ˆyc=1, we can approximate λ with fixed-point iteration (Algorithm 10.2).