8.1 Geostatistical data

Remember: Areal data: discrete data

  • Continuous phenomenon (density of mosquito) but recorded at specific locations (trap for mosquito)

\[{Z(s) : s \in D \subset \mathbb{R}^2}\]

\(Z(s_1), ..., Z(s_n)\) observation of Z (spatial variable) at \(s_1, ..., s_n\) (locations)

8.1.1 Gaussian Random Fields (GRF)

A Gaussian random field (GRF) is a collection of random variables where the observations occur in a continuous domain, and where every finite collection [ex: latitude, longitude] of random variables has a multivariate normal distribution.

Random process = stochastic process

.. by definition it can’t be constant

8.1.2 Stationarity

  • strickly/strong stationary: a shift in location will not result in any change in the joint distribution of the random variables (ex: white noise)

  • weak stationary (second order): mean are constant on the domain \(D\) and covariances depend only on the difference between locations (\(h\)) :

\[Cov(Z(s), Z(s+h)) = C(h) \]

C is a covariance function (see later with Matèrn)

  • Intrinsic stationary: the variance between two locations relies only on the distance (\(h\)) (and not their location).

\(Var(Z(s+h) - Z(s))\) it is called a variogram

Remember: \(Var(X) = E[(X -E(X))^2]\)

\[ Var(Z(s + h) - Z(s)) = E\{Z(s+h) - Z(s) - E(Z(s+h) + Z(s))\}^2\]

We reorganize it:

\[2 \gamma = E\{(Z(s+h) - E(Z(s+h))) - (Z(s) - E(Z(s))))\}^2\]

\[2\gamma = E\{(Z(s+h) -E(Z(s+h)))^2\} + E\{(Z(s) - E(Z(s)))^2\} \\ - 2E\{(Z(s+h) - E\{(Z(s+h))\}) * (Z(s) - E\{(Z(s)\})\}\]

\[2\gamma = Var(Z(s+h)) + Var(Z(s)) - 2Cov(Z(s+h), Z(s)) \]

Remember that \(Var(Z(s+h)) = Var(Z(s))\):

\[ 2 \gamma = C(0) + C(0) - 2C(h)\]

Then a semivariogram is composed of \(C(0)\) (or nugget) and \(C(h)\) (or spatial covariance function)

\[ \gamma = C(0) - C(h) \]

We can also obtain the empirical semivariogram this way:

\[\hat{\gamma}(h) = \frac{1}{2|N(h)|}\sum_{N(h)}(Z(s_i) - Z(s_j))² \] #### Isotropy/Anisotropy

The direction of \(h\) does not matter only it’s length = isotropy

From Ch8

TODO maybe: draw it with R

let’s pick a distance \(d\), remember cov = var * cor

\[C(d) = \sigma² \rho(d), \quad d > 0 \]

\(\rho(d)\) is a correlation function when \(\rho(d) = 0\) it is called range, it is the minimun distance were random observation are said to be “independant”. This is very hard to get so we use effective range were we have a very low amount of correlation (usually 0.05).

8.1.3 Usefull covariance functions

They should not allow negative values for variance (yup variation can only be positive).

8.1.3.1 Exponential model

\[Cov(Z(s_i), Z(s_j)) = \sigma^2 exp(-k||s_i -s_j||) \]

h <- seq(from = 0, to = 1, by = .05)
sigma = 1

expo_model <- function(sigma, k, h) {
  sigma^2 * exp(-k * h)
}

# k is the decay
k_10 <- expo_model(sigma, k = 10, h)
k_5 <- expo_model(sigma, k = 5, h)
k_1 <- expo_model(sigma, k = 1, h)

plot(h, k_10, type = "l", ylab = "Cov" )
lines(h, k_5, lty = 2)
lines(h, k_1, lty = 3)

8.1.3.2 Matèrn model

\[Cov(Z(s_i), Z(s_j)) = \frac{\sigma^2}{2^{v-1}\Gamma(v)}(k||s_i - s_j||)^vK_v(k||s_i-s_j||)\] \(K\) is the effective range here define when \(\rho = \frac{\sqrt(8)}{k}\)

TODO: undertand it and INLA::inla.matern.cov()