Bayes and Logistic Regression
The class posterior distribution for a Naive Bayes classification model has the same form as multinomial logistic regression:
\[p(y = c|\vec{x}, \vec{\theta}) = \displaystyle\frac{e^{\beta_{c}^{T}\vec{x} + \gamma_{c}}}{\displaystyle\sum_{c'=1}^{C} e^{\beta_{c}^{T}\vec{x} + \gamma_{c}}}\]
Naive Bayes
\[f(y | x_{1}, x_{2}, ..., x_{p}) = \frac{f(y) \cdot L(y | x_{1}, x_{2}, ..., x_{p})}{\sum_{y'} f(y') \cdot L(y' | x_{1}, x_{2}, ..., x_{p})}\]
- conditionally independent \(\rightarrow\) computationally efficient
- generalizes to more than two categories
- assumptions violated commonly in practice
- optimizes joint likelihood \(\displaystyle\prod_{n} p(y_{n},\vec{x}_{n}|\vec{\theta})\)
Logistic Regression
\[\log\left(\frac{\pi}{1-\pi}\right) = \beta_{0} + \beta_{1}X_{1} + \cdots + \beta_{k}X_{p}\]
- binary classification
- coefficients \(\rightarrow\) illumination of the relationships among these variables
- optimizes conditional likelihood \(\displaystyle\prod_{n} p(y_{n}|\vec{x}_{n},\vec{\theta})\)