Why containers for Data Science?

  • Containers are mostly used for:
    1. Packaging an environment for someone else to use
    2. Packaging a finished product (project/app/whatever) for archiving, reproducibility, or production

Potential examples:

  • When publishing, instead of providing code, data, and describing the environment used, you can include a Dockerfile so anyone can pick up exactly where you left off
  • You have a project at work that needs to be interacted with every week no matter who looks at it or when
  • You’re publishing an R package and want to test specific features across different base R or package versions