8.1 Geostatistical data

Remember: Areal data: discrete data

  • Continuous phenomenon (density of mosquito) but recorded at specific locations (trap for mosquito)

Z(s):sDR2

Z(s1),...,Z(sn) observation of Z (spatial variable) at s1,...,sn (locations)

8.1.1 Gaussian Random Fields (GRF)

A Gaussian random field (GRF) is a collection of random variables where the observations occur in a continuous domain, and where every finite collection [ex: latitude, longitude] of random variables has a multivariate normal distribution.

Random process = stochastic process

.. by definition it can’t be constant

8.1.2 Stationarity

  • strickly/strong stationary: a shift in location will not result in any change in the joint distribution of the random variables (ex: white noise)

  • weak stationary (second order): mean are constant on the domain D and covariances depend only on the difference between locations (h) :

Cov(Z(s),Z(s+h))=C(h)

C is a covariance function (see later with Matèrn)

  • Intrinsic stationary: the variance between two locations relies only on the distance (h) (and not their location).

Var(Z(s+h)Z(s)) it is called a variogram

Remember: Var(X)=E[(XE(X))2]

Var(Z(s+h)Z(s))=E{Z(s+h)Z(s)E(Z(s+h)+Z(s))}2

We reorganize it:

2γ=E{(Z(s+h)E(Z(s+h)))(Z(s)E(Z(s))))}2

2γ=E{(Z(s+h)E(Z(s+h)))2}+E{(Z(s)E(Z(s)))2}2E{(Z(s+h)E{(Z(s+h))})(Z(s)E{(Z(s)})}

2γ=Var(Z(s+h))+Var(Z(s))2Cov(Z(s+h),Z(s))

Remember that Var(Z(s+h))=Var(Z(s)):

2γ=C(0)+C(0)2C(h)

Then a semivariogram is composed of C(0) (or nugget) and C(h) (or spatial covariance function)

γ=C(0)C(h)

We can also obtain the empirical semivariogram this way:

\hat{\gamma}(h) = \frac{1}{2|N(h)|}\sum_{N(h)}(Z(s_i) - Z(s_j))² #### Isotropy/Anisotropy

The direction of h does not matter only it’s length = isotropy

From Ch8

TODO maybe: draw it with R

let’s pick a distance d, remember cov = var * cor

C(d) = \sigma² \rho(d), \quad d > 0

\rho(d) is a correlation function when \rho(d) = 0 it is called range, it is the minimun distance were random observation are said to be “independant”. This is very hard to get so we use effective range were we have a very low amount of correlation (usually 0.05).

8.1.3 Usefull covariance functions

They should not allow negative values for variance (yup variation can only be positive).

8.1.3.1 Exponential model

Cov(Z(s_i), Z(s_j)) = \sigma^2 exp(-k||s_i -s_j||)

h <- seq(from = 0, to = 1, by = .05)
sigma = 1

expo_model <- function(sigma, k, h) {
  sigma^2 * exp(-k * h)
}

# k is the decay
k_10 <- expo_model(sigma, k = 10, h)
k_5 <- expo_model(sigma, k = 5, h)
k_1 <- expo_model(sigma, k = 1, h)

plot(h, k_10, type = "l", ylab = "Cov" )
lines(h, k_5, lty = 2)
lines(h, k_1, lty = 3)

8.1.3.2 Matèrn model

Cov(Z(s_i), Z(s_j)) = \frac{\sigma^2}{2^{v-1}\Gamma(v)}(k||s_i - s_j||)^vK_v(k||s_i-s_j||) K is the effective range here define when \rho = \frac{\sqrt(8)}{k}

TODO: undertand it and INLA::inla.matern.cov()