5.4 Spatial small area disease risk estimation

\[Y\sim{Po(E_{i} \theta_{i})}, i=1,..,n,\] \[log(\theta_i)=\alpha+u_i+v_i\]

\(\alpha\) is the overall risk, \(u_i\) the random effect in area i, and \(v_i\) is some noise \(v_i\sim{N(0,\sigma_v^{2})}\)

To include covariates to quantify risk factors and other random effects:

\(\alpha=\mathbf{d_i \beta}\)

The Besag-York-Mollié (BYM) model (Besag, York, and Mollié 1991) \(u_i\), which is the spatial random effect, has a Conditional Autoregressive (CAR) distribution:

\[u_i|\mathbf{u_{-i}}\sim{N(\bar{u_{\sigma_i},\frac{\sigma_u^2}{n_{\sigma_i}}})}\]

\(\bar{u}_{\sigma_i}=n_{\sigma_i}^{-1}\sum_{j\epsilon\sigma_i}{u_j}\)

In R-INLA, the formula of the BYM model is specified as follows:

\[Y\sim{f_1(besag)+f_2(iid)}\]

formula <- Y ~
  f(idareau, model = "besag", # CAR distribution
    graph = g,                # neighborhood structure
    scale.model = TRUE) +
  f(idareav, model = "iid")

(The BYM model can also be specified with the model “bym”)

5.4.1 Spatial modeling of lung cancer in Pennsylvania

To calculate the relative risks of lung cancer in the Pennsylvania counties, a new parametrization of the BYM model is called BYM2, bym2 model, it defines the components as:

\[\mathbf{b}=\frac{1}{\sqrt{\pi_b}(\sqrt{1-\phi}\mathit{v}_*+\sqrt{\phi}\mathit{u_*})}\]

\(\pi_b\) is the precision parameter and controls the marginal variance of the weighted sum of \(\mathit{u_*}\) and \(\mathit{v_*}\), \(\phi\) is between 0 and 1, and measures the proportion of the marginal variance explained by \(\mathit{u_*}\)

\[\text{BYM2}=\left\{\begin{matrix} \phi=1 & \text{one spatial model} \\ \phi=0 & \text{one unstructured spatial noise} \end{matrix}\right.\]

\[Y\sim{f(.)}\]

formula <- Y ~ f(idarea, # is the index variable of the area
                 model = "bym2", 
                 graph = g)
map$idarea <- 1:nrow(map@data)

As a rule of thumb, given the PC prior for the marginal precision \(\pi_b\):

\[P((1/\sqrt{\pi_b})>U)=\alpha\]

we use: \[U=0.5/0.31\] and \[\alpha=0.01\]

prior <- list(
  prec = list(
    prior = "pc.prec",
    param = c(0.5 / 0.31, 0.01)),
  phi = list(
    prior = "pc",
    param = c(0.5, 2 / 3))
  )
nb <- poly2nb(map)
head(nb)
## [[1]]
## [1] 21 28 67
## 
## [[2]]
## [1]  3  4 10 63 65
## 
## [[3]]
## [1]  2 10 16 32 33 65
## 
## [[4]]
## [1]  2 10 37 63
## 
## [[5]]
## [1]  7 11 29 31 56
## 
## [[6]]
## [1] 15 36 38 39 46 54
spdep::nb2INLA("map.adj", nb)

g <- INLA::inla.read.graph(filename = "map.adj")
# it requires BiocManager::install("Rgraphviz")
plot(g)

formula <- Y ~ f(idarea, 
                 model = "bym2", 
                 graph = g, 
                 hyper = prior)

And fit the model:

res <- inla(formula,
  family = "poisson", 
  data = map@data,
  E = E, 
  control.predictor = list(compute = TRUE)
)
summary(res)
## 
## Call:
##    c("inla.core(formula = formula, family = family, contrasts = contrasts, 
##    ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
##    scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
##    ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
##    verbose, ", " lincomb = lincomb, selection = selection, control.compute 
##    = control.compute, ", " control.predictor = control.predictor, 
##    control.family = control.family, ", " control.inla = control.inla, 
##    control.fixed = control.fixed, ", " control.mode = control.mode, 
##    control.expert = control.expert, ", " control.hazard = control.hazard, 
##    control.lincomb = control.lincomb, ", " control.update = 
##    control.update, control.lp.scale = control.lp.scale, ", " 
##    control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
##    ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
##    num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
##    working.directory = working.directory, ", " silent = silent, inla.mode 
##    = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
##    .parent.frame)") 
## Time used:
##     Pre = 7.85, Running = 0.233, Post = 0.0264, Total = 8.11 
## Fixed effects:
##               mean    sd 0.025quant 0.5quant 0.975quant  mode kld
## (Intercept) -0.051 0.017     -0.085   -0.051     -0.018 -0.05   0
## 
## Random effects:
##   Name     Model
##     idarea BYM2 model
## 
## Model hyperparameters:
##                         mean     sd 0.025quant 0.5quant 0.975quant   mode
## Precision for idarea 130.004 54.276     55.553   119.60    265.184 101.50
## Phi for idarea         0.478  0.255      0.058     0.47      0.928   0.29
## 
## Marginal log-Likelihood:  -225.07 
##  is computed 
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
head(res$summary.fitted.values)
##                          mean         sd 0.025quant  0.5quant 0.975quant
## fitted.Predictor.01 0.8669640 0.06830565  0.7372532 0.8654205   1.005983
## fitted.Predictor.02 1.0680081 0.02865210  1.0130592 1.0675715   1.125435
## fitted.Predictor.03 0.9109486 0.06870064  0.7742302 0.9118266   1.043970
## fitted.Predictor.04 0.9977557 0.05802206  0.8878789 0.9963214   1.115951
## fitted.Predictor.05 0.9091276 0.07084270  0.7744446 0.9074647   1.054075
## fitted.Predictor.06 0.9956106 0.04749906  0.9070281 0.9939495   1.093505
##                          mode
## fitted.Predictor.01 0.8627566
## fitted.Predictor.02 1.0666945
## fitted.Predictor.03 0.9144297
## fitted.Predictor.04 0.9936197
## fitted.Predictor.05 0.9048641
## fitted.Predictor.06 0.9905917
map$RR <- res$summary.fitted.values[, "mean"]
map$LL <- res$summary.fitted.values[, "0.025quant"]
map$UL <- res$summary.fitted.values[, "0.975quant"]

summary(map@data[, c("RR", "LL", "UL")])
##        RR               LL               UL        
##  Min.   :0.8409   Min.   :0.7160   Min.   :0.9454  
##  1st Qu.:0.9138   1st Qu.:0.7810   1st Qu.:1.0480  
##  Median :0.9446   Median :0.8191   Median :1.0838  
##  Mean   :0.9550   Mean   :0.8330   Mean   :1.0886  
##  3rd Qu.:0.9904   3rd Qu.:0.8795   3rd Qu.:1.1324  
##  Max.   :1.1470   Max.   :1.0897   Max.   :1.2594
mapsf <- st_as_sf(map)

gRR <- ggplot(mapsf) + geom_sf(aes(fill = RR)) +
  scale_fill_gradient2(
    midpoint = 1, low = "blue", mid = "white", high = "red",
    limits = c(0.7, 1.5)
  ) +
  theme_bw()
gLL <- ggplot(mapsf) + geom_sf(aes(fill = LL)) +
  scale_fill_gradient2(
    midpoint = 1, low = "blue", mid = "white", high = "red",
    limits = c(0.7, 1.5)
  ) +
  theme_bw()
gUL <- ggplot(mapsf) + geom_sf(aes(fill = UL)) +
  scale_fill_gradient2(
    midpoint = 1, low = "blue", mid = "white", high = "red",
    limits = c(0.7, 1.5)
  ) +
  theme_bw()
cowplot::plot_grid(gRR, gLL, gUL, ncol = 1)

The posterior mean of the BYm2 random effect \(\mathbf{b}\):

mapsf$re <- res$summary.random$idarea[1:67, "mean"]

ggplot(mapsf) + geom_sf(aes(fill = re)) +
  scale_fill_gradient2(
    midpoint = 0, low = "blue", mid = "white", high = "red"
  ) +
  theme_bw()