NBC Math

MLEs

binary features

$\hat{\theta}_{dc} = \frac{N_{dc}}{N_{c}}$

discrete features

$\hat{\theta}_{dck} = \displaystyle\frac{N_{dck}}{N_{c}}$

numerical features

$\begin{array}{rcl} \hat{\mu}_{dc} & = & \displaystyle\frac{1}{N_{dc}} \displaystyle\sum_{n:y_{n} = c} x_{nd} \\ \hat{\sigma}_{dc}^{2} & = & \displaystyle\frac{1}{N_{dc}} \displaystyle\sum_{n:y_{n} = c} (x_{nd} - \hat{\mu}_{dc})^{2} \\ \end{array}$

MAP: add-one smoothing

$\begin{array}{rcl} \bar{\theta}_{dc} & = & \displaystyle\frac{1 + N_{dc1}}{2 + N_{dc}} \\ p(y = c|\vec{x}, D) & \propto & \bar{\pi}_{c}\displaystyle\prod_{d}\prod_{k} \bar{\theta}_{dck} \cdot I(x_{d} = k) \\ \end{array}$

Imputation

Suppose that we are missing the value of $x_{j}$

Gaussian discriminant analysis

$p(y=c|\vec{x}_{i \neq j}, \vec{\theta}) = p(y = c)\displaystyle\sum_{x_{j}} p(x_{j}, \vec{x}_{i \neq j}|y = c, \vec{\theta})$

Naive Bayes classifier

$\displaystyle\sum_{x_{j}} p(x_{j}, x_{i \neq j} | y = c, \vec{\theta}) = \displaystyle\prod_{i \neq j}^{D} p(x_{i}|\vec{\theta}_{dc})$