8.1 Geostatistical data

Remember: Areal data: discrete data

Continuous phenomenon (density of mosquito) but recorded at specific locations (trap for mosquito)

${Z(s) : s \in D \subset \mathbb{R}^2}$

$Z(s_1), ..., Z(s_n)$ observation of Z (spatial variable) at $s_1, ..., s_n$ (locations)

8.1.1 Gaussian Random Fields (GRF)

A Gaussian random field (GRF) is a collection of random variables where the observations occur in a continuous domain, and where every finite collection [ex: latitude, longitude] of random variables has a multivariate normal distribution.

Random process = stochastic process

.. by definition it can’t be constant

8.1.2 Stationarity

strickly/strong stationary: a shift in location will not result in any change in the joint distribution of the random variables (ex: white noise)
weak stationary (second order): mean are constant on the domain $D$ and covariances depend only on the difference between locations ( $h$ ) :

$Cov(Z(s), Z(s+h)) = C(h)$

C is a covariance function (see later with Matèrn)

Intrinsic stationary: the variance between two locations relies only on the distance ( $h$ ) (and not their location).

$Var(Z(s+h) - Z(s))$ it is called a variogram

Remember: $Var(X) = E[(X -E(X))^2]$

$Var(Z(s + h) - Z(s)) = E\{Z(s+h) - Z(s) - E(Z(s+h) + Z(s))\}^2$

We reorganize it:

$2 \gamma = E\{(Z(s+h) - E(Z(s+h))) - (Z(s) - E(Z(s))))\}^2$

$2\gamma = E\{(Z(s+h) -E(Z(s+h)))^2\} + E\{(Z(s) - E(Z(s)))^2\} \\ - 2E\{(Z(s+h) - E\{(Z(s+h))\}) * (Z(s) - E\{(Z(s)\})\}$

$2\gamma = Var(Z(s+h)) + Var(Z(s)) - 2Cov(Z(s+h), Z(s))$

Remember that $Var(Z(s+h)) = Var(Z(s))$ :

$2 \gamma = C(0) + C(0) - 2C(h)$

Then a semivariogram is composed of $C(0)$ (or nugget) and $C(h)$ (or spatial covariance function)

$\gamma = C(0) - C(h)$

We can also obtain the empirical semivariogram this way:

$\hat{\gamma}(h) = \frac{1}{2|N(h)|}\sum_{N(h)}(Z(s_i) - Z(s_j))²$ #### Isotropy/Anisotropy

The direction of $h$ does not matter only it’s length = isotropy

From Ch8

TODO maybe: draw it with R

let’s pick a distance $d$ , remember cov = var * cor

$C(d) = \sigma² \rho(d), \quad d > 0$

$\rho(d)$ is a correlation function when $\rho(d) = 0$ it is called range, it is the minimun distance were random observation are said to be “independant”. This is very hard to get so we use effective range were we have a very low amount of correlation (usually 0.05).

8.1.3 Usefull covariance functions

They should not allow negative values for variance (yup variation can only be positive).

8.1.3.1 Exponential model

$Cov(Z(s_i), Z(s_j)) = \sigma^2 exp(-k||s_i -s_j||)$

h <- seq(from = 0, to = 1, by = .05)
sigma = 1

expo_model <- function(sigma, k, h) {
  sigma^2 * exp(-k * h)
}

# k is the decay
k_10 <- expo_model(sigma, k = 10, h)
k_5 <- expo_model(sigma, k = 5, h)
k_1 <- expo_model(sigma, k = 1, h)

plot(h, k_10, type = "l", ylab = "Cov" )
lines(h, k_5, lty = 2)
lines(h, k_1, lty = 3)

8.1.3.2 Matèrn model

$Cov(Z(s_i), Z(s_j)) = \frac{\sigma^2}{2^{v-1}\Gamma(v)}(k||s_i - s_j||)^vK_v(k||s_i-s_j||)$ $K$ is the effective range here define when $\rho = \frac{\sqrt(8)}{k}$

TODO: undertand it and INLA::inla.matern.cov()