22.8 Computational efficency in Stan
Stan uses “Hamiltonian Monte Carlo” by default, which produces a random walk in parameter space.
This is an iterative and stochastic process.
By default produces 4 parallel chains of 1000 draws each = 4000 total draws
Diagnostics can help evaluate the simulation:
- R-hat - compares different chains. If not near 1 then the chains have not fully mixed.
- N_eff - the effective number of samples (samples are correlated due to iterative nature of the simulations.). ‘Usually’ n_eff > 400 is sufficient.
- mcse - “Monte carlo standard error” - additional uncertainty due to the stochastic algorithm, negligable in all examples in this book.
With larger and more complex data sets and / or predictors, computation speed can be a limiting factor. Some options:
Parallel processing - rstan can take advantage of multiple processors if they are available.
options(mc.cores = parallel::detectCores())
Mode-based approximations -
stan_glm
can be made as fast asglm
while retaining the advantages of Bayesian inference by approximating the full Bayesian calculation. One method (“optimizing”) uses a normal approximation centered at the posterior mode.See demo ‘Scalability’ tidy-ros
Other (‘Variational inference’) algorithms are available but beyond the scope of this book.