Demystyfing Docker

Learning objectives

  • Decide whether a container is the right tool for a given job.
  • Download and run pre-built Docker images.
  • Describe the stages of the Docker container lifecycle.
  • Build simple Dockerfiles for your own projects.

Why docker matters for data science

  • Docker creates standardized environments that are:

    1. Reproducible

    2. Portable

    3. Collaborative

    4. Scalable

The Docker logo, a whale with shipping containers on its back

What is Docker?

  • An open-source tool for building, sharing, and running software

The Docker lifecycle and commands, showing that a Dockerfile produces a Docker Image, which leads to a Docker Container

Specify your environment via a Dockerfile

  • Dockerfiles build Docker images

  • Dockerfiles are plain text files using FROM, RUN, COPY, and CMD commands

1FROM ubuntu:latest
2COPY my-data.csv /data/data.csv
3RUN ["head", "/data/data.csv"]
1
Declare the base image
2
Copy data.csv from the host’s working directory to the container’s data directory
3
Print the first few rows of data.csv

Docker images are a snapshot of your environment

  • Docker images contain the bundled software (e.g. OS, data, packages)

  • Docker images can be shared with others via Docker hub

  • Docker images can, in theory, be a standalone project

Containers are an ephemeral instance of a Docker Image

  • By default, changes made to containers are lost on shutdown

  • Data can be preserved from instance to instance of the same container using mounted volumes

  • Containers are a process that executes the layers of your Dockerfile

Meeting Videos

Cohort 1