14.5 Two Predictor Variables

image code
penguins |>
ggplot(aes(x = flipper_length_mm, y = bill_length_mm, 
           color = species)) + 
  geom_point(size = 3) + 
  geom_segment(aes(x = 195, y = 30, xend = 195, yend = 50),
               color = "black", linetype = 2, linewidth = 2) +
  geom_segment(aes(x = 170, y = 50, xend = 195, yend = 50),
               color = "black", linetype = 2, linewidth = 2) +
  labs(title = "<span style = 'color:#c65ccc'>Two Predictor Variables</span>",
       subtitle = "50mm-long bill and 195mm-long flipper",
       caption = "R4DS Book Club") +
  scale_color_manual(values = c(adelie_color, chinstrap_color, gentoo_color)) +
  theme_minimal() +
  theme(plot.title = element_markdown(face = "bold", size = 24),
        plot.subtitle = element_markdown(size = 16))

Generalizing Bayes’ Rule:

\[f(y | x_{2}, x_{3}) = \frac{f(y) \cdot L(y | x_{2}, x_{3})}{\sum_{y'} f(y') \cdot L(y' | x_{2}, x_{3})}\]

Another “naive” assumption of conditionally independent:

\[L(y | x_{2}, x_{3}) = f(x_{2}, x_{3} | y) = f(x_{2} | y) \cdot f(x_{3} | y)\]

  • mathematically efficient
  • but what about correlation?
# sample statistics of x_3: flipper length
penguins %>% 
  group_by(species) %>% 
  summarize(mean = mean(flipper_length_mm, na.rm = TRUE), 
            sd = sd(flipper_length_mm, na.rm = TRUE))
## # A tibble: 3 × 3
##   species    mean    sd
##   <fct>     <dbl> <dbl>
## 1 Adelie     190.  6.54
## 2 Chinstrap  196.  7.13
## 3 Gentoo     217.  6.48

Likelihoods of a flipper length of 195 mm:

# L(y = A | x_3 = 195) = 0.04554
dnorm(195, mean = 190, sd = 6.54)

# L(y = C | x_3 = 195) = 0.05541
dnorm(195, mean = 196, sd = 7.13)

# L(y = G | x_3 = 195) = 0.0001934
dnorm(195, mean = 217, sd = 6.48)

Total probability:

\[f(x_{2} = 50, x_{3} = 195) = \frac{151}{342} \cdot 0.0000212 \cdot 0.04554 + \frac{68}{342} \cdot 0.112 \cdot 0.05541 + \frac{123}{342} \cdot 0.09317 \cdot 0.0001931 \approx 0.001241\]

Bayes’ Rules:

\[\begin{array}{rcccl} f(y = A | x_{2} = 50, x_{3} = 195) & = & \frac{\frac{151}{342} \cdot 0.0000212 \cdot 0.04554}{0.0001931} & \approx & 0.0003 \\ f(y = C | x_{2} = 50, x_{3} = 195) & = & \frac{\frac{68}{342} \cdot 0.112 \cdot 0.05541}{0.0001931} & \approx & 0.9944 \\ f(y = G | x_{2} = 50, x_{3} = 195) & = & \frac{\frac{123}{342} \cdot 0.09317 \cdot 0.0001931}{0.0001931} & \approx & 0.0052 \\ \end{array}\]

In conclusion, our penguin is almost certainly a Chinstrap.