3.1 Bayesian inference

Why Bayesian?

Bayesian hierarchical models are often use for spatial / spatio-temporal data.

Quick note: we will define “hierarchical” later but now keep it mind that he has more than one layer

Flexibility in how estimates can borrow strength across space and time

We need to specify:

a probability distribution called a likelihood :

Likelihood

a prior distribution:

Prior

Then you have 2 approach :

Fully Bayesian: specify hyperprior for $\eta$
Empirical Bayes: using $\eta$ estimates as if we know it/them

If we know $\eta$ we can do our inference on $\theta$ using Bayes’s Theorem:

Bayes’s Theorem

Quick reminder: $P(A\cap B) = P(A|B)P(B)$

The denominator $\pi(y) = \int\pi(y|\theta)\pi(\theta)d\theta$ is the marginal likelihood of the data $y$ . To get a probability we need to make $\pi(\theta|y)$ bound between 0 and 1 this is the goal of the marginal likelihood also know as normalizing constants.

3.1.1 A small analogy with R: garden of forking data

Source: chapter2 of Rethinking statistics from Richard McElreath
Other good read: chapter 2 of Bayes Rules! from Alicia A.Johnson, Miles Q. Ott and Mine Dogucu

A bag with 4 marbles, they can be blue (U) or white (W). We do not know the exact composition but we know what are our options (“parameters”) and we have data: we draw (with replacement) U / W / U.

Parameter <-  c("WWWW", "UWWW", "UUWW", "UUUW", "UUUU")
Paths_U_W_U <- c("0 x 4 x 0", "1 x 3 x 1", "2 x 2 x 2 ", "3 x 1 x 3", "4 x 0 x 4")
bille.dat <- data.frame(Parameter, Paths_U_W_U)
bille.dat$data <- c(0, 3, 8, 9, 0)
knitr::kable(bille.dat)

Parameter	Paths_U_W_U	data
WWWW	0 x 4 x 0	0
UWWW	1 x 3 x 1	3
UUWW	2 x 2 x 2	8
UUUW	3 x 1 x 3	9
UUUU	4 x 0 x 4	0

We are drawing an other marble: U

Parameter <-  c("WWWW", "UWWW", "UUWW", "UUUW", "UUUU")
Paths_U <- c(0, 1, 2, 3, 4)
bille2.dat <- data.frame(Parameter, Paths_U)
bille2.dat$Prior <- c(0, 3, 8, 9, 0)
bille2.dat$New_posterior <- Paths_U * bille2.dat$Prior
knitr::kable(bille2.dat)

Parameter	Paths_U	Prior	New_posterior
WWWW	0	0	0
UWWW	1	3	3
UUWW	2	8	16
UUUW	3	9	27
UUUU	4	0	0

We can go a step further with $p$ (the parameter we want to know) as our proportion of blue marble.

Parameter <- c("WWWW", "UWWW", "UUWW", "UUUW", "UUUU")
p <- c(0, 0.25, 0.5, 0.75, 1)
tabl_def <- data.frame(Parameter, p)
tabl_def$Paths <- c( 0 , 3 , 8 , 9 , 0 )
tabl_def$Marginal_likelihood <- tabl_def$Paths/sum(tabl_def$Paths)
knitr::kable(tabl_def)

Parameter	p	Paths	Marginal_likelihood
WWWW	0.00	0	0.00
UWWW	0.25	3	0.15
UUWW	0.50	8	0.40
UUUW	0.75	9	0.45
UUUU	1.00	0	0.00

Here our distribution was discrete (marbles) but with continuous distribution the idea is the same (this is just more difficult to do the math!).

Important point: marginal likelihood do not “depends” of parameters we will have the same shape of our distribution without it (it is proportional) :

$\pi(\theta|y) \propto \pi(y|\theta)\pi(\theta)$

In this quick example we just have one parameter (proportion of blue = 1 - proportion of white) so we can “just” count but in a lot of times calculating the posterior is way harder.

MCMC (Markov chain Monte Carlo) methods are used to create sample of $\theta$ then we can use it sample to get the posterior distribution of our parameters.

Implementations of MCMC:

WinBUGS
JAGS
Stan

You need to do use diagnostics on the sample create with MCMC (traceplot, effective sample size ratio, etc ..)

They are great but can be heavy on computing!

When we are with a latent Gaussian models a computational less-intensive alternative: INLA

INLA website: http://www.r-inla.org