Bayes’ Theorem and law of total probability
By re-arranging terms in the definition of P[A∣B] and P[B∣A] we have Bayes’ theorem: P[A∣B]=P[B∣A]P[A]P[B] This lets us switch A and B in conditional probability statements.
Law of Total Probability lets us decompose an event into smaller events: let {A1,A2,...,An} be a partition of Ω. Then for any B⊆Ω P[B]=n∑i=1P[B∣Ai]P[Ai] Proof is in the book, but the interpretation is that if the sample space can be chopped up into subsets Ai, we can compute the probability of P[B] by summing over its portion in each of the subsets: P[B∩Ai]=P[B∣Ai]P[Ai].
This allows us the write one of my favorite equations!
P[Aj∣B]=P[B∣Aj]P[Aj]∑ni=1P[B∣Ai]P[Ai]
In Bayesian statistics, P[Aj∣B] is the posterior, P[B∣Aj] is the likelihood, and P[Aj] is the prior. The denominator is the normalization factor.