14.5 Two Predictor Variables

image code
penguins |>
ggplot(aes(x = flipper_length_mm, y = bill_length_mm, 
           color = species)) + 
  geom_point(size = 3) + 
  geom_segment(aes(x = 195, y = 30, xend = 195, yend = 50),
               color = "black", linetype = 2, linewidth = 2) +
  geom_segment(aes(x = 170, y = 50, xend = 195, yend = 50),
               color = "black", linetype = 2, linewidth = 2) +
  labs(title = "<span style = 'color:#c65ccc'>Two Predictor Variables</span>",
       subtitle = "50mm-long bill and 195mm-long flipper",
       caption = "R4DS Book Club") +
  scale_color_manual(values = c(adelie_color, chinstrap_color, gentoo_color)) +
  theme_minimal() +
  theme(plot.title = element_markdown(face = "bold", size = 24),
        plot.subtitle = element_markdown(size = 16))

Generalizing Bayes’ Rule:

f(y|x2,x3)=f(y)L(y|x2,x3)yf(y)L(y|x2,x3)

Another “naive” assumption of conditionally independent:

L(y|x2,x3)=f(x2,x3|y)=f(x2|y)f(x3|y)

  • mathematically efficient
  • but what about correlation?
# sample statistics of x_3: flipper length
penguins %>% 
  group_by(species) %>% 
  summarize(mean = mean(flipper_length_mm, na.rm = TRUE), 
            sd = sd(flipper_length_mm, na.rm = TRUE))
## # A tibble: 3 × 3
##   species    mean    sd
##   <fct>     <dbl> <dbl>
## 1 Adelie     190.  6.54
## 2 Chinstrap  196.  7.13
## 3 Gentoo     217.  6.48

Likelihoods of a flipper length of 195 mm:

# L(y = A | x_3 = 195) = 0.04554
dnorm(195, mean = 190, sd = 6.54)

# L(y = C | x_3 = 195) = 0.05541
dnorm(195, mean = 196, sd = 7.13)

# L(y = G | x_3 = 195) = 0.0001934
dnorm(195, mean = 217, sd = 6.48)

Total probability:

f(x2=50,x3=195)=1513420.00002120.04554+683420.1120.05541+1233420.093170.00019310.001241

Bayes’ Rules:

f(y=A|x2=50,x3=195)=1513420.00002120.045540.00019310.0003f(y=C|x2=50,x3=195)=683420.1120.055410.00019310.9944f(y=G|x2=50,x3=195)=1233420.093170.00019310.00019310.0052

In conclusion, our penguin is almost certainly a Chinstrap.