3.1 Bayesian inference
Why Bayesian?
- Bayesian hierarchical models are often use for spatial / spatio-temporal data.
Quick note: we will define “hierarchical” later but now keep it mind that he has more than one layer
Flexibility in how estimates can borrow strength across space and time
We need to specify:
- a probability distribution called a likelihood :
- a prior distribution:
Then you have 2 approach :
Fully Bayesian: specify hyperprior for \(\eta\)
Empirical Bayes: using \(\eta\) estimates as if we know it/them
If we know \(\eta\) we can do our inference on \(\theta\) using Bayes’s Theorem:
Quick reminder: \(P(A\cap B) = P(A|B)P(B)\)
The denominator \(\pi(y) = \int\pi(y|\theta)\pi(\theta)d\theta\) is the marginal likelihood of the data \(y\). To get a probability we need to make \(\pi(\theta|y)\) bound between 0 and 1 this is the goal of the marginal likelihood also know as normalizing constants.
3.1.1 A small analogy with R: garden of forking data
Source: chapter2 of Rethinking statistics from Richard McElreath
Other good read: chapter 2 of Bayes Rules! from Alicia A.Johnson, Miles Q. Ott and Mine Dogucu
A bag with 4 marbles, they can be blue (U) or white (W). We do not know the exact composition but we know what are our options (“parameters”) and we have data: we draw (with replacement) U / W / U.
<- c("WWWW", "UWWW", "UUWW", "UUUW", "UUUU")
Parameter <- c("0 x 4 x 0", "1 x 3 x 1", "2 x 2 x 2 ", "3 x 1 x 3", "4 x 0 x 4")
Paths_U_W_U <- data.frame(Parameter, Paths_U_W_U)
bille.dat $data <- c(0, 3, 8, 9, 0)
bille.dat::kable(bille.dat) knitr
Parameter | Paths_U_W_U | data |
---|---|---|
WWWW | 0 x 4 x 0 | 0 |
UWWW | 1 x 3 x 1 | 3 |
UUWW | 2 x 2 x 2 | 8 |
UUUW | 3 x 1 x 3 | 9 |
UUUU | 4 x 0 x 4 | 0 |
We are drawing an other marble: U
<- c("WWWW", "UWWW", "UUWW", "UUUW", "UUUU")
Parameter <- c(0, 1, 2, 3, 4)
Paths_U <- data.frame(Parameter, Paths_U)
bille2.dat $Prior <- c(0, 3, 8, 9, 0)
bille2.dat$New_posterior <- Paths_U * bille2.dat$Prior
bille2.dat::kable(bille2.dat) knitr
Parameter | Paths_U | Prior | New_posterior |
---|---|---|---|
WWWW | 0 | 0 | 0 |
UWWW | 1 | 3 | 3 |
UUWW | 2 | 8 | 16 |
UUUW | 3 | 9 | 27 |
UUUU | 4 | 0 | 0 |
We can go a step further with \(p\) (the parameter we want to know) as our proportion of blue marble.
<- c("WWWW", "UWWW", "UUWW", "UUUW", "UUUU")
Parameter <- c(0, 0.25, 0.5, 0.75, 1)
p <- data.frame(Parameter, p)
tabl_def $Paths <- c( 0 , 3 , 8 , 9 , 0 )
tabl_def$Marginal_likelihood <- tabl_def$Paths/sum(tabl_def$Paths)
tabl_def::kable(tabl_def) knitr
Parameter | p | Paths | Marginal_likelihood |
---|---|---|---|
WWWW | 0.00 | 0 | 0.00 |
UWWW | 0.25 | 3 | 0.15 |
UUWW | 0.50 | 8 | 0.40 |
UUUW | 0.75 | 9 | 0.45 |
UUUU | 1.00 | 0 | 0.00 |
Here our distribution was discrete (marbles) but with continuous distribution the idea is the same (this is just more difficult to do the math!).
Important point: marginal likelihood do not “depends” of parameters we will have the same shape of our distribution without it (it is proportional) :
\[\pi(\theta|y) \propto \pi(y|\theta)\pi(\theta) \]
In this quick example we just have one parameter (proportion of blue = 1 - proportion of white) so we can “just” count but in a lot of times calculating the posterior is way harder.
MCMC (Markov chain Monte Carlo) methods are used to create sample of \(\theta\) then we can use it sample to get the posterior distribution of our parameters.
Implementations of MCMC:
WinBUGS
JAGS
Stan
You need to do use diagnostics on the sample create with MCMC (traceplot, effective sample size ratio, etc ..)
They are great but can be heavy on computing!
When we are with a latent Gaussian models a computational less-intensive alternative: INLA
INLA website: http://www.r-inla.org