14.3 One Categorical Predictor

Suppose an Antarctic researcher comes across a penguin that weighs less than 4200g with a 195mm-long flipper and 50mm-long bill. Our goal is to help this researcher identify the species of this penguin: Adelie, Chinstrap, or Gentoo

image code
penguins |>
  drop_na(above_average_weight) |>
  ggplot(aes(fill = above_average_weight, x = species)) + 
  geom_bar(position = "fill") + 
  labs(title = "<span style = 'color:#067476'>For which species is a<br>below-average weight most likely?</span>",
       subtitle = "(focus on the <span style = 'color:#c65ccc'>below-average</span> category)",
       caption = "R4DS Book Club") +
  scale_fill_manual(values = c("#c65ccc", "#fb7504")) +
  theme_minimal() +
  theme(plot.title = element_markdown(face = "bold", size = 24),
        plot.subtitle = element_markdown(size = 16))

14.3.1 Recall: Bayes Rule

f(y|x1)=priorlikelihoodnormalizing constant=f(y)L(y|x1)f(x1) where, by the Law of Total Probability,

f(x1=all yf(y)L(y|x1) =f(y=A)L(y=A|x1)+f(y=C)L(y=C|x1)+f(y=G)L(y=G|x1)

over our three penguin species.

14.3.2 Calculation

penguins %>% 
  select(species, above_average_weight) %>% 
  na.omit() %>% 
  tabyl(species, above_average_weight) %>% 
  adorn_totals(c("row", "col"))
##    species   0   1 Total
##     Adelie 126  25   151
##  Chinstrap  61   7    68
##     Gentoo   6 117   123
##      Total 193 149   342

Prior probabilities:

f(y=A)=151342,f(y=C)=68342,f(y=G)=123342

Likelihoods:

L(y=A|x1=0)=1261510.8344L(y=C|x1=0)=61680.8971L(y=G|x1=0)=61230.0488

Total probability:

f(x1=0)=151342126151+683426168+1233426123=193342

Bayes’ Rules:

f(y=A|x1=0)=f(y=A)L(y=A|x1=0)f(x1=0)=1513421261511933420.6528f(y=C|x1=0)=f(y=A)L(y=C|x1=0)f(x1=0)=6834261681933420.3161f(y=G|x1=0)=f(y=A)L(y=G|x1=0)f(x1=0)=12334261231933420.0311

The posterior probability that this penguin is an Adelie is more than double that of the other two species