Bayes’ Theorem and law of total probability

By re-arranging terms in the definition of \(\mathbb{P}[A\mid B]\) and \(\mathbb{P}[B \mid A]\) we have Bayes’ theorem: \[ \mathbb{P}[A \mid B] = \frac{\mathbb{P}[B \mid A]\mathbb{P}[A]}{P[B]} \] This lets us switch \(A\) and \(B\) in conditional probability statements.

Law of Total Probability lets us decompose an event into smaller events: let \(\{A_1,A_2,...,A_n\}\) be a partition of \(\Omega\). Then for any \(B \subseteq \Omega\) \[ \mathbb{P}[B] = \sum_{i=1}^{n}\mathbb{P}[B\mid A_i]\mathbb{P}[A_i] \] Proof is in the book, but the interpretation is that if the sample space can be chopped up into subsets \(A_i\), we can compute the probability of \(\mathbb{P}[B]\) by summing over its portion in each of the subsets: \(\mathbb{P}[B\cap A_i] = \mathbb{P}[B \mid A_i]\mathbb{P}[A_i]\).

This allows us the write one of my favorite equations!

\[ \mathbb{P}[A_j \mid B] = \frac{\mathbb{P}[B \mid A_j]\mathbb{P}[A_j]}{\sum_{i=1}^{n}\mathbb{P}[B\mid A_i]\mathbb{P}[A_i]} \]

In Bayesian statistics, \(\mathbb{P}[A_j \mid B]\) is the posterior, \(\mathbb{P}[B \mid A_j]\) is the likelihood, and \(\mathbb{P}[A_j]\) is the prior. The denominator is the normalization factor.