Demystyfing Docker

Learning objectives

Decide whether a container is the right tool for a given job.
Download and run pre-built Docker images.
Describe the stages of the Docker container lifecycle.
Build simple Dockerfiles for your own projects.

Why docker matters for data science

Docker creates standardized environments that are:
1. Reproducible
2. Portable
3. Collaborative
4. Scalable

The Docker logo, a whale with shipping containers on its back

What is Docker?

An open-source tool for building, sharing, and running software

The Docker lifecycle and commands, showing that a Dockerfile produces a Docker Image, which leads to a Docker Container

Specify your environment via a `Dockerfile`

Dockerfiles build Docker images
Dockerfiles are plain text files using FROM, RUN, COPY, and CMD commands

1FROM ubuntu:latest
2COPY my-data.csv /data/data.csv
3RUN ["head", "/data/data.csv"]

1: Declare the base image
2: Copy data.csv from the host’s working directory to the container’s data directory
3: Print the first few rows of data.csv

Docker images are a snapshot of your environment

Docker images contain the bundled software (e.g. OS, data, packages)
Docker images can be shared with others via Docker hub
Docker images can, in theory, be a standalone project

Containers are an ephemeral instance of a Docker Image

By default, changes made to containers are lost on shutdown
Data can be preserved from instance to instance of the same container using mounted volumes
Containers are a process that executes the layers of your Dockerfile