22.8 Computational efficency in Stan

Stan uses “Hamiltonian Monte Carlo” by default, which produces a random walk in parameter space.
- This is an iterative and stochastic process.
- By default produces 4 parallel chains of 1000 draws each = 4000 total draws
- Diagnostics can help evaluate the simulation:
  - R-hat - compares different chains. If not near 1 then the chains have not fully mixed.
  - N_eff - the effective number of samples (samples are correlated due to iterative nature of the simulations.). ‘Usually’ n_eff > 400 is sufficient.
  - mcse - “Monte carlo standard error” - additional uncertainty due to the stochastic algorithm, negligable in all examples in this book.
With larger and more complex data sets and / or predictors, computation speed can be a limiting factor. Some options:
- Parallel processing - rstan can take advantage of multiple processors if they are available. options(mc.cores = parallel::detectCores())
- Mode-based approximations - stan_glm can be made as fast as glm while retaining the advantages of Bayesian inference by approximating the full Bayesian calculation. One method (“optimizing”) uses a normal approximation centered at the posterior mode.
- See demo ‘Scalability’ tidy-ros
Other (‘Variational inference’) algorithms are available but beyond the scope of this book.