Practical Statistics for Data Scientists Book Club
Welcome
Book club meetings
Sample code
1st edition vs 2nd edition
Pace
0.1
Meeting Videos
0.1.1
Cohort 1
1
Exploratory Data Analysis
1.1
Structured Data
1.2
Estimates of Location
1.3
Estimates of Variability
1.3.1
SD & Friends
1.3.2
Percentiles & Friends
1.4
Histograms & Friends
1.5
Visualizing Categorical Data
1.6
Correlation
1.7
2D Distributions
1.8
Meeting Videos
1.8.1
Cohort 1
2
Data and Sampling Distributions
2.1
What is a Population?
Populations
Population Statistics
2.2
What is a Sample?
What is a Sample?
2.3
We Have a Problem
2.4
The Ideal Solution
The Ideal Solution
2.5
The Central Limit Theorem
2.6
The Bootstrap Solution
The Bootstrap
The Bootstrap
The Bootstrap: Limitations
2.7
Confidence Intervals
2.8
Some Important Distributions
2.9
The Normal Distribution
2.10
Student’s t-Distribution
2.11
The Binomial Distribution
2.12
The Chi-Square Distribution
2.13
The F-Distribution
2.14
The Poisson Distribution
2.15
Meeting Videos
2.15.1
Cohort 1
3
Statistical Experiments and Significance Testing
3.1
SLIDE 2
3.2
SLIDE 3
3.3
Meeting Videos
3.3.1
Cohort 1
4
Regression and Prediction
4.1
Weighted Regression
4.2
Prediction using Regression
4.3
Factor Variables
4.3.1
Dummy Variables
4.3.2
Ordered Factor Variables
4.4
Interpreting Regression Equations
4.4.1
Correlated Variables (Variables that move together, either in the same direction or opposite direction)
4.4.2
Multicollinearity (when a predictor can be expressed as a linear combination of other predictors–extreme case of correlated variables)
4.4.3
Confounding Variables (problem of ommision)
4.4.4
Main Effects and Interactions
4.5
Regression Diagnostics
4.5.1
Outliers (extreme value)
This may not be an influential case
4.5.2
Assumption Checking (Heteroscedasticity, Normality of residuals, Linearity, and Collinearity)
4.6
Non-linear Regression
4.6.1
Partial Residual Plots and Nonlinearity
4.6.2
Polynomial and Spline Regression
4.7
Generalized Additive Models
4.8
Meeting Videos
4.8.1
Cohort 1, Part 1
4.8.2
Cohort 1, Part 2
5
Classification
5.1
Types of Models
5.2
Evaluating Classification Models
5.3
ROC Curves
5.4
Lift
5.5
Imbalanced Data
5.6
Cost-Based Classification
5.7
Meeting Videos
5.7.1
Cohort 1, part 1
5.7.2
Cohort 1, part 2
6
Statistical Machine Learning
Statistical Machine Learning
6.1
KNN
6.2
Tree Models
6.3
Bagging and the Random Forest
6.4
Boosting
Boosting
Boosting
6.5
Meeting Videos
6.5.1
Cohort 1
7
Unsupervised Learning
Unsupervised Learning
Principal Component Analysis (PCA)
K-Means Clustering
Hierarchical Clustering
Model-Based Clustering
7.1
Meeting Videos
7.1.1
Cohort 1
Published with bookdown
Practical Statistics for Data Scientists Book Club
5.4
Lift
autoplot
(
lift_curve
(two_class_example, truth, Class1))