5.4 Spatial small area disease risk estimation
\[Y\sim{Po(E_{i} \theta_{i})}, i=1,..,n,\] \[log(\theta_i)=\alpha+u_i+v_i\]
\(\alpha\) is the overall risk, \(u_i\) the random effect in area i, and \(v_i\) is some noise \(v_i\sim{N(0,\sigma_v^{2})}\)
To include covariates to quantify risk factors and other random effects:
\(\alpha=\mathbf{d_i \beta}\)
The Besag-York-Mollié (BYM) model (Besag, York, and Mollié 1991) \(u_i\), which is the spatial random effect, has a Conditional Autoregressive (CAR) distribution:
\[u_i|\mathbf{u_{-i}}\sim{N(\bar{u_{\sigma_i},\frac{\sigma_u^2}{n_{\sigma_i}}})}\]
\(\bar{u}_{\sigma_i}=n_{\sigma_i}^{-1}\sum_{j\epsilon\sigma_i}{u_j}\)
In R-INLA, the formula of the BYM model is specified as follows:
\[Y\sim{f_1(besag)+f_2(iid)}\]
<- Y ~
formula f(idareau, model = "besag", # CAR distribution
graph = g, # neighborhood structure
scale.model = TRUE) +
f(idareav, model = "iid")
(The BYM model can also be specified with the model “bym”)
5.4.1 Spatial modeling of lung cancer in Pennsylvania
To calculate the relative risks of lung cancer in the Pennsylvania counties, a new parametrization of the BYM model is called BYM2, bym2 model, it defines the components as:
\[\mathbf{b}=\frac{1}{\sqrt{\pi_b}(\sqrt{1-\phi}\mathit{v}_*+\sqrt{\phi}\mathit{u_*})}\]
\(\pi_b\) is the precision parameter and controls the marginal variance of the weighted sum of \(\mathit{u_*}\) and \(\mathit{v_*}\), \(\phi\) is between 0 and 1, and measures the proportion of the marginal variance explained by \(\mathit{u_*}\)
\[\text{BYM2}=\left\{\begin{matrix} \phi=1 & \text{one spatial model} \\ \phi=0 & \text{one unstructured spatial noise} \end{matrix}\right.\]
\[Y\sim{f(.)}\]
<- Y ~ f(idarea, # is the index variable of the area
formula model = "bym2",
graph = g)
$idarea <- 1:nrow(map@data) map
As a rule of thumb, given the PC prior for the marginal precision \(\pi_b\):
\[P((1/\sqrt{\pi_b})>U)=\alpha\]
we use: \[U=0.5/0.31\] and \[\alpha=0.01\]
<- list(
prior prec = list(
prior = "pc.prec",
param = c(0.5 / 0.31, 0.01)),
phi = list(
prior = "pc",
param = c(0.5, 2 / 3))
)
<- poly2nb(map)
nb head(nb)
## [[1]]
## [1] 21 28 67
##
## [[2]]
## [1] 3 4 10 63 65
##
## [[3]]
## [1] 2 10 16 32 33 65
##
## [[4]]
## [1] 2 10 37 63
##
## [[5]]
## [1] 7 11 29 31 56
##
## [[6]]
## [1] 15 36 38 39 46 54
::nb2INLA("map.adj", nb)
spdep
<- INLA::inla.read.graph(filename = "map.adj") g
# it requires BiocManager::install("Rgraphviz")
plot(g)
<- Y ~ f(idarea,
formula model = "bym2",
graph = g,
hyper = prior)
And fit the model:
<- inla(formula,
res family = "poisson",
data = map@data,
E = E,
control.predictor = list(compute = TRUE)
)
summary(res)
##
## Call:
## c("inla.core(formula = formula, family = family, contrasts = contrasts,
## ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
## scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
## ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
## verbose, ", " lincomb = lincomb, selection = selection, control.compute
## = control.compute, ", " control.predictor = control.predictor,
## control.family = control.family, ", " control.inla = control.inla,
## control.fixed = control.fixed, ", " control.mode = control.mode,
## control.expert = control.expert, ", " control.hazard = control.hazard,
## control.lincomb = control.lincomb, ", " control.update =
## control.update, control.lp.scale = control.lp.scale, ", "
## control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
## ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
## num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
## working.directory = working.directory, ", " silent = silent, inla.mode
## = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
## .parent.frame)")
## Time used:
## Pre = 7.85, Running = 0.233, Post = 0.0264, Total = 8.11
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -0.051 0.017 -0.085 -0.051 -0.018 -0.05 0
##
## Random effects:
## Name Model
## idarea BYM2 model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for idarea 130.004 54.276 55.553 119.60 265.184 101.50
## Phi for idarea 0.478 0.255 0.058 0.47 0.928 0.29
##
## Marginal log-Likelihood: -225.07
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
head(res$summary.fitted.values)
## mean sd 0.025quant 0.5quant 0.975quant
## fitted.Predictor.01 0.8669640 0.06830565 0.7372532 0.8654205 1.005983
## fitted.Predictor.02 1.0680081 0.02865210 1.0130592 1.0675715 1.125435
## fitted.Predictor.03 0.9109486 0.06870064 0.7742302 0.9118266 1.043970
## fitted.Predictor.04 0.9977557 0.05802206 0.8878789 0.9963214 1.115951
## fitted.Predictor.05 0.9091276 0.07084270 0.7744446 0.9074647 1.054075
## fitted.Predictor.06 0.9956106 0.04749906 0.9070281 0.9939495 1.093505
## mode
## fitted.Predictor.01 0.8627566
## fitted.Predictor.02 1.0666945
## fitted.Predictor.03 0.9144297
## fitted.Predictor.04 0.9936197
## fitted.Predictor.05 0.9048641
## fitted.Predictor.06 0.9905917
$RR <- res$summary.fitted.values[, "mean"]
map$LL <- res$summary.fitted.values[, "0.025quant"]
map$UL <- res$summary.fitted.values[, "0.975quant"]
map
summary(map@data[, c("RR", "LL", "UL")])
## RR LL UL
## Min. :0.8409 Min. :0.7160 Min. :0.9454
## 1st Qu.:0.9138 1st Qu.:0.7810 1st Qu.:1.0480
## Median :0.9446 Median :0.8191 Median :1.0838
## Mean :0.9550 Mean :0.8330 Mean :1.0886
## 3rd Qu.:0.9904 3rd Qu.:0.8795 3rd Qu.:1.1324
## Max. :1.1470 Max. :1.0897 Max. :1.2594
<- st_as_sf(map)
mapsf
<- ggplot(mapsf) + geom_sf(aes(fill = RR)) +
gRR scale_fill_gradient2(
midpoint = 1, low = "blue", mid = "white", high = "red",
limits = c(0.7, 1.5)
+
) theme_bw()
<- ggplot(mapsf) + geom_sf(aes(fill = LL)) +
gLL scale_fill_gradient2(
midpoint = 1, low = "blue", mid = "white", high = "red",
limits = c(0.7, 1.5)
+
) theme_bw()
<- ggplot(mapsf) + geom_sf(aes(fill = UL)) +
gUL scale_fill_gradient2(
midpoint = 1, low = "blue", mid = "white", high = "red",
limits = c(0.7, 1.5)
+
) theme_bw()
::plot_grid(gRR, gLL, gUL, ncol = 1) cowplot
The posterior mean of the BYm2 random effect \(\mathbf{b}\):
$re <- res$summary.random$idarea[1:67, "mean"]
mapsf
ggplot(mapsf) + geom_sf(aes(fill = re)) +
scale_fill_gradient2(
midpoint = 0, low = "blue", mid = "white", high = "red"
+
) theme_bw()