14.5 Two Predictor Variables
image code
|>
penguins ggplot(aes(x = flipper_length_mm, y = bill_length_mm,
color = species)) +
geom_point(size = 3) +
geom_segment(aes(x = 195, y = 30, xend = 195, yend = 50),
color = "black", linetype = 2, linewidth = 2) +
geom_segment(aes(x = 170, y = 50, xend = 195, yend = 50),
color = "black", linetype = 2, linewidth = 2) +
labs(title = "<span style = 'color:#c65ccc'>Two Predictor Variables</span>",
subtitle = "50mm-long bill and 195mm-long flipper",
caption = "R4DS Book Club") +
scale_color_manual(values = c(adelie_color, chinstrap_color, gentoo_color)) +
theme_minimal() +
theme(plot.title = element_markdown(face = "bold", size = 24),
plot.subtitle = element_markdown(size = 16))
Generalizing Bayes’ Rule:
f(y|x2,x3)=f(y)⋅L(y|x2,x3)∑y′f(y′)⋅L(y′|x2,x3)
Another “naive” assumption of conditionally independent:
L(y|x2,x3)=f(x2,x3|y)=f(x2|y)⋅f(x3|y)
- mathematically efficient
- but what about correlation?
# sample statistics of x_3: flipper length
%>%
penguins group_by(species) %>%
summarize(mean = mean(flipper_length_mm, na.rm = TRUE),
sd = sd(flipper_length_mm, na.rm = TRUE))
## # A tibble: 3 × 3
## species mean sd
## <fct> <dbl> <dbl>
## 1 Adelie 190. 6.54
## 2 Chinstrap 196. 7.13
## 3 Gentoo 217. 6.48
Likelihoods of a flipper length of 195 mm:
# L(y = A | x_3 = 195) = 0.04554
dnorm(195, mean = 190, sd = 6.54)
# L(y = C | x_3 = 195) = 0.05541
dnorm(195, mean = 196, sd = 7.13)
# L(y = G | x_3 = 195) = 0.0001934
dnorm(195, mean = 217, sd = 6.48)
Total probability:
f(x2=50,x3=195)=151342⋅0.0000212⋅0.04554+68342⋅0.112⋅0.05541+123342⋅0.09317⋅0.0001931≈0.001241
Bayes’ Rules:
f(y=A|x2=50,x3=195)=151342⋅0.0000212⋅0.045540.0001931≈0.0003f(y=C|x2=50,x3=195)=68342⋅0.112⋅0.055410.0001931≈0.9944f(y=G|x2=50,x3=195)=123342⋅0.09317⋅0.00019310.0001931≈0.0052
In conclusion, our penguin is almost certainly a Chinstrap.