DevOps for Data Science Book Club
Welcome
Book club meetings
Pace
Learning objectives
Introduction
Devops?
DevOps vs IT/Admins
Red Flags about IT/Admins
This Book
DevOps Lessons for Data Science
The 5 Tenets of DevOps
DevOps for Data Science
Meeting Videos
Cohort 1
I DevOps Lessons for Data Science
1
Code promotion and integration
Why Do We Care About Code Promotion Workflows?
The Three Environments
Dev
Test
Prod
CI/CD
How Does it Work?
Per-Environment Configuration
Creating & Maintaining Identical Environments
Review
Meeting Videos
Cohort 1
2
Environments as Code
Recap
Intro Discussion
The Three Layers of Environments
The Package Environment
Please don’t
The Rooms of a House
There Can Only Be One
Why?
Virtual Environments to the Rescue
Using Renv
Sharing The Environment
Avoiding Conda in production
Taking Control of the System Environment
Further reading
Review
Meeting Videos
Cohort 1
3
Data Pipeline Architecture
Perspectives on Data Science Projects
Productionizing apps
The What & Where Questions of Storage
Factors we should consider when we choose our storage solution
Data authorization
Database-as-a-service
Review
Meeting Videos
Cohort 1
4
Designing Your Application Layers
Perspectives on Data Science Projects
Productionizing apps
The What & Where Questions of Storage
Factors we should consider when we choose our storage solution
Data authorization
Database-as-a-service
Review
Meeting Videos
Cohort 1
5
Logging and Monitoring
Logs and Metrics
log4r Basics
log4r Layouts
Metrics
Review Discussion
Meeting Videos
Cohort 1
II Tools of the Trade
6
The Terminal
6.1
The Three Programs
6.2
Why should you learn to use the terminal?
6.3
Selecting your terminal
6.4
Selecting your shell
Meeting Videos
Cohort 1
7
Connecting Securely with SSH
7.1
SSH
7.2
Keys
7.3
What’s the difference?
7.4
When should you move your private key?
7.5
Public keys as locks
7.6
Practical SSH Usage
7.7
Comprehension Questions:
Meeting Videos
Cohort 1
8
Getting Comfortable on the Command Line
bash
Commands
Listing files in the current directory
Long Commands
Directories and Files
What if you were a file?
Reading Text Files
The Pipe
Moving and Copying Files
Server Files
Writing Files
With the Command Line
With Command Line Text Editors
Memes About Exiting Vim
Meeting Videos
Cohort 1
9
Docker for Data Science
Why containers?
Why containers for Data Science?
Words of caution
Diving In
Docker Run
Docker Compose
Container Lifecyle
More on Docker Run
Build a Dockerfile
Trying out Docker
Meeting Videos
Cohort 1
III The Cloud
10
Getting Started with the Cloud
10.1
Quick Section Overview
10.2
Why cloud?
10.3
Things as a Service
10.4
Common Cloud Services
10.5
Discuss the Lab
10.6
Meeting Videos
10.6.1
Cohort 1
11
Basic Linux SysAdmin
Stand up an EC2 (chapter 10 lab review)
Create a non-root user
Add an ssh key
Notes on personal ssh key
Install R
Install RStudio Server
Tunnel to see it locally
Install Python and JupyterLab
Plumber in docker
Meeting Videos
Cohort 1
12
Intro to Computer Networks
Install nginx
Allow outside traffic
Edit nginx.conf
Add RStudio routing
Add plumber api routing
Meeting Videos
Cohort 1
13
DNS allows for human-readable addresses
Allocate an Elastic IP
Configure DNS Records
Reconfigure nginx
Meeting Videos
Cohort 1
14
You should use SSL/HTTPS
Get ready
certbot
rstudio in nginx.conf
Open the port
Meeting Videos
Cohort 1
15
Choosing the Right Server for You
What is Computing?
CPU: The physical brain
How do I go faster…?
GPU
So Should I Get GPU…?
RAM
Disk (Storage)
Scenarios
Lab
Meeting Videos
Cohort 1
IV Making it Enterprise-Grade
16
Open Source Data Science in the Enterprise
Data Science Sandboxes
Read-only access
Package availability
Promotion
Dev/Test/Prod for Admins
Infrastructure as Code (IaC)
Open Source in Enterprise
Package Restrictions
Meeting Videos
Cohort 1
17
Enterprise Networking
Enterprise Networking Terminology
Benefits of Private Networks
Issues with Inbound Proxies
Issues with Outbound Proxies
Meeting Videos
Cohort 1
18
Auth in Enterprise
Introduction
LDAP/AD
Single Sign On (SSO)
Permissions
Why should we care?
Learning Objectives
Meeting Videos
Cohort 1
19
Enterprise Server Management
19.1
Simplyfing Complexity Using Infrastructure as Code (IaC)
19.2
How to Scale and Stabilize Workloads in Enterprise-Level Organisations
19.3
Facilitating Enterprise Workloads Management with Kubernetes
Meeting Videos
Cohort 1
Published with bookdown
DevOps for Data Science Book Club
Learning objectives
I am a big fan!
Students who read with LOs in mind
retain more.
Tips:
“When you finish reading this Chapter, you will be able to…”
Very
roughly
1 per section.
I’ll
try
to have them in the repo by Tuesday each week.