Bayes’ Theorem and law of total probability

By re-arranging terms in the definition of $\mathbb{P}[A\mid B]$ and $\mathbb{P}[B \mid A]$ we have Bayes’ theorem: $\mathbb{P}[A \mid B] = \frac{\mathbb{P}[B \mid A]\mathbb{P}[A]}{P[B]}$ This lets us switch $A$ and $B$ in conditional probability statements.

Law of Total Probability lets us decompose an event into smaller events: let $\{A_1,A_2,...,A_n\}$ be a partition of $\Omega$ . Then for any $B \subseteq \Omega$ $\mathbb{P}[B] = \sum_{i=1}^{n}\mathbb{P}[B\mid A_i]\mathbb{P}[A_i]$ Proof is in the book, but the interpretation is that if the sample space can be chopped up into subsets $A_i$ , we can compute the probability of $\mathbb{P}[B]$ by summing over its portion in each of the subsets: $\mathbb{P}[B\cap A_i] = \mathbb{P}[B \mid A_i]\mathbb{P}[A_i]$ .

This allows us the write one of my favorite equations!

$\mathbb{P}[A_j \mid B] = \frac{\mathbb{P}[B \mid A_j]\mathbb{P}[A_j]}{\sum_{i=1}^{n}\mathbb{P}[B\mid A_i]\mathbb{P}[A_i]}$

In Bayesian statistics, $\mathbb{P}[A_j \mid B]$ is the posterior, $\mathbb{P}[B \mid A_j]$ is the likelihood, and $\mathbb{P}[A_j]$ is the prior. The denominator is the normalization factor.