8.1 Geostatistical data
Remember: Areal data: discrete data
- Continuous phenomenon (density of mosquito) but recorded at specific locations (trap for mosquito)
\[{Z(s) : s \in D \subset \mathbb{R}^2}\]
\(Z(s_1), ..., Z(s_n)\) observation of Z (spatial variable) at \(s_1, ..., s_n\) (locations)
8.1.1 Gaussian Random Fields (GRF)
A Gaussian random field (GRF) is a collection of random variables where the observations occur in a continuous domain, and where every finite collection [ex: latitude, longitude] of random variables has a multivariate normal distribution.
Random process = stochastic process
.. by definition it can’t be constant
8.1.2 Stationarity
strickly/strong stationary: a shift in location will not result in any change in the joint distribution of the random variables (ex: white noise)
weak stationary (second order): mean are constant on the domain \(D\) and covariances depend only on the difference between locations (\(h\)) :
\[Cov(Z(s), Z(s+h)) = C(h) \]
C is a covariance function (see later with Matèrn)
- Intrinsic stationary: the variance between two locations relies only on the distance (\(h\)) (and not their location).
\(Var(Z(s+h) - Z(s))\) it is called a variogram
Remember: \(Var(X) = E[(X -E(X))^2]\)
\[ Var(Z(s + h) - Z(s)) = E\{Z(s+h) - Z(s) - E(Z(s+h) + Z(s))\}^2\]
We reorganize it:
\[2 \gamma = E\{(Z(s+h) - E(Z(s+h))) - (Z(s) - E(Z(s))))\}^2\]
\[2\gamma = E\{(Z(s+h) -E(Z(s+h)))^2\} + E\{(Z(s) - E(Z(s)))^2\} \\ - 2E\{(Z(s+h) - E\{(Z(s+h))\}) * (Z(s) - E\{(Z(s)\})\}\]
\[2\gamma = Var(Z(s+h)) + Var(Z(s)) - 2Cov(Z(s+h), Z(s)) \]
Remember that \(Var(Z(s+h)) = Var(Z(s))\):
\[ 2 \gamma = C(0) + C(0) - 2C(h)\]
Then a semivariogram is composed of \(C(0)\) (or nugget) and \(C(h)\) (or spatial covariance function)
\[ \gamma = C(0) - C(h) \]
We can also obtain the empirical semivariogram this way:
\[\hat{\gamma}(h) = \frac{1}{2|N(h)|}\sum_{N(h)}(Z(s_i) - Z(s_j))² \] #### Isotropy/Anisotropy
The direction of \(h\) does not matter only it’s length = isotropy
TODO maybe: draw it with R
let’s pick a distance \(d\), remember cov = var * cor
\[C(d) = \sigma² \rho(d), \quad d > 0 \]
\(\rho(d)\) is a correlation function when \(\rho(d) = 0\) it is called range, it is the minimun distance were random observation are said to be “independant”. This is very hard to get so we use effective range were we have a very low amount of correlation (usually 0.05).
8.1.3 Usefull covariance functions
They should not allow negative values for variance (yup variation can only be positive).
8.1.3.1 Exponential model
\[Cov(Z(s_i), Z(s_j)) = \sigma^2 exp(-k||s_i -s_j||) \]
<- seq(from = 0, to = 1, by = .05)
h = 1
sigma
<- function(sigma, k, h) {
expo_model ^2 * exp(-k * h)
sigma
}
# k is the decay
<- expo_model(sigma, k = 10, h)
k_10 <- expo_model(sigma, k = 5, h)
k_5 <- expo_model(sigma, k = 1, h)
k_1
plot(h, k_10, type = "l", ylab = "Cov" )
lines(h, k_5, lty = 2)
lines(h, k_1, lty = 3)