Introduction to Statistical Learning Using R Book Club
Welcome
Book club meetings
1st edition vs 2nd edition
Pace
1
Introduction
1.1
What is statistical learning?
1.2
Why ISLR?
1.3
Premises of ISLR
1.4
Notation
1.5
What have we gotten ourselves into?
1.6
Where’s the data?
1.7
Some useful resources:
1.8
What is covered in the book?
1.9
How is the book divided?
1.10
Some examples of the problems addressed with statistical analysis
1.11
Datasets provided in the ISLR2 package
1.11.1
Example datasets
1.12
Meeting Videos
1.12.1
Cohort 1
1.12.2
Cohort 2
1.12.3
Cohort 3
1.12.4
Cohort 4
1.12.5
Cohort 5
2
Statistical Learning
2.1
What is Statistical Learning?
2.1.1
Why Estimate
\(f\)
?
2.1.2
How do we estimate
\(f\)
?
2.1.3
Prediction Accuracy vs Model Interpretability
2.1.4
Supervised Versus Unsupervised Learning
2.1.5
Regression Versus Classification Problems
2.2
Assessing Model Accuracy
2.2.1
Measuring Quality of Fit
2.2.2
The Bias-Variance Trade-Off
2.2.3
The Classification Setting
2.3
Exercises
2.4
Meeting Videos
2.4.1
Cohort 1
2.4.2
Cohort 2
2.4.3
Cohort 3
2.4.4
Cohort 4
2.4.5
Cohort 5
3
Linear Regression
3.1
Questions to Answer
3.2
Simple Linear Regression: Definition
3.3
Simple Linear Regression: Visualization
3.4
Simple Linear Regression: Math
3.4.1
Visualization of Fit
3.5
Assessing Accuracy of Coefficient Estimates
3.6
Assessing the Accuracy of the Model
3.7
Multiple Linear Regression
3.7.1
Important Questions
3.8
Qualitative Predictors
3.9
Extensions
3.10
Potential Problems
3.11
Answers to the Marketing Plan questions
3.12
Comparison of Linear Regression with K-Nearest Neighbors
3.13
Meeting Videos
3.13.1
Cohort 1
3.13.2
Cohort 2
3.13.3
Cohort 3
3.13.4
Cohort 4
3.13.5
Cohort 5
4
Classification
4.1
An Overview of Classification
4.2
Why NOT Linear Regression?
4.3
Logistic Regression
4.3.1
The Logistic Model
4.3.2
Estimating the Regression Coefficient
4.3.3
Multiple Logistic Regression
4.3.4
Multinomial Logistic Regression
4.4
Generative Models for Classification
4.5
A Comparison of Classification Methods
4.5.1
Linear Discriminant Analysis for p = 1
4.5.2
Linear Discriminant Analysis for p > 1
4.5.3
Quadratic Discriminant Analysis (QDA)
4.5.4
Naive Bayes
4.6
Summary of the classification methods
4.6.1
An Analytical Comparison
4.6.2
An Empirical Comparison
4.7
Generalized Linear Models
4.8
Linear regression with count data - negative values
4.9
Linear regression with count data - heteroscedasticity
4.10
Problems with linear regression of count data
4.11
Poisson distribution
4.12
Poisson Regression Model mean (lambda)
4.13
Estimating the Poisson Regression parameters
4.14
Interpreting Poisson Regression
4.15
Advantages of Poisson Regression
4.16
Generalized Linear Models
4.17
Addendum - Logistic Regression Assumptions
4.18
Lab: Classification Methods
4.19
Exercises
4.19.1
Conceptual
4.20
Meeting Videos
4.20.1
Cohort 1
4.20.2
Cohort 2
4.20.3
Cohort 3
4.20.4
Cohort 4
4.20.5
Cohort 5
5
Resampling Methods
5.1
Validation Set Approach
5.2
Validation Error Rate Varies Depending on Data Set
5.3
Leave-One-Out Cross-Validation (LOOCV)
5.4
Advantages of LOOCV over Validation Set Approach
5.5
k-fold Cross-Validation
5.6
Graphical Illustration of k-fold Approach
5.7
Advantages of k-fold Cross-Validation over LOOCV
5.8
Bias-Variance Tradeoff and k-fold Cross-Validation
5.9
Cross-Validation on Classification Problems
5.10
Logistic Polynomial Regression, Bayes Decision Boundaries, and k-fold Cross Validation
5.11
The Bootstrap
5.12
A simple bootstrap example
5.13
Population Distribution Compared to Bootstrap Distribution
5.14
Bootstrap Standard Error
5.15
Lab: Cross-Validation and the Bootstrap
5.15.1
The Validation Set Approach
5.15.2
Leave-One-Out Cross-Validation
5.15.3
k-Fold Cross-Validation
5.15.4
The Bootstrap
5.16
Meeting Videos
5.16.1
Cohort 1
5.16.2
Cohort 2
5.16.3
Cohort 3
5.16.4
Cohort 4
5.16.5
Cohort 5
6
Linear Model Selection and Regularization
6.1
Subset Selection
Best Subset Selection (BSS)
BSS Algorithm
Best Subset Selection (BSS)
Forward Stepwise Subset Selection (FsSS)
Backward Stepwise Subset Selection (BsSS)
Hybrid searches
Choosing the best model
Adjustment Methods
Avoiding Adjustment Methods
6.2
Shrinkage Methods
Overview
OLS review
Ridge Regression
Ridge Regression, Visually
Preprocessing
The Lasso
The Lasso, Visually
How lasso eliminiates predictors.
Bayesian Interpretation
6.3
Dimension Reduction Methods
The Math
Principal Components Regression
Partial Least Squares
6.4
Considerations in High Dimensions
Lasso (etc) vs Dimensionality
6.5
Exercises
6.5.1
Exercise 7
6.6
Meeting Videos
6.6.1
Cohort 1
6.6.2
Cohort 2
6.6.3
Cohort 3
6.6.4
Cohort 4
6.6.5
Cohort 5
7
Moving Beyond Linearity
7.1
Polynomial and Step Regression
7.2
Splines
7.3
Generalized Additive Models
7.4
Conceptual Exercises
7.5
Applied Exercises
7.6
Meeting Videos
7.6.1
Cohort 1
7.6.2
Cohort 2
7.6.3
Cohort 3
7.6.4
Cohort 4
7.6.5
Cohort 5
8
Tree-based methods
8.1
Introduction: Tree-based methods
8.2
Regression Trees
8.3
Terminology:
8.4
Interpretation of results: regression tree (Hitters data)
8.5
Tree-building process (regression)
8.6
Recursive binary splitting
8.7
Recursive binary splitting (continued)
8.8
But…
8.9
Pruning a tree
8.10
An example: tree pruning (Hitters data)
8.11
Classification trees
8.12
Classification trees (continued)
8.13
Example: classification tree (Heart data)
8.14
Advantages/Disadvantages of decision trees
8.15
Bagging
8.16
Bagging (continued)
8.17
Out-of-bag error estimation
8.18
Variable importance measures
8.19
Random forests
8.20
Random forests: advantages over bagging
8.21
Example: Random forests versus bagging (gene expression data)
8.22
Boosting
8.23
Boosting algorithm
8.24
Example: Boosting versus random forests
8.25
Bayesian additive regression trees (BART)
8.26
But first, BART notation:
8.27
Now, the BART algorithm
8.28
BART algorithm: iteration 2 and on
8.29
BART algorithm: figure
8.30
BART: additional details
8.31
To apply BART:
8.32
Lab: Tree-Based Methods - Fitting Classification Trees
8.33
Exploratory Data Analysis (EDA)
8.34
Correlation Analysis
8.35
Build a model
8.36
Visualize our decision tree
8.37
Evaluate the model
8.38
Tuning the model
8.39
Evaluate the model
8.40
Visualize the tuned decision tree (classification)
8.41
Variable importance
8.42
Final evaluation
8.43
Fitting Regression Trees
8.44
Decision Trees (Regression) Explained (StatQuest)
8.44.1
EDA
8.45
Correlation Analysis
8.46
Build a regression tree
8.47
Visualize our decision tree
8.48
Evaluate the model
8.49
Tuning the regression model
8.50
Evaluate the model
8.51
Visualize the tuned decision tree (regression)
8.52
Variable importance
8.53
Final evaluation
8.54
Bagging and Random Forests
8.55
Random Forest Diagram
8.56
Example
8.57
Evaluate the model
8.58
Variable importance
8.59
Random Forest using a set of features (mtry)
8.60
Evaluate the model
8.61
Variable importance
8.62
Boosting
8.63
Evaluate the model
8.64
Tuning the xgboost regression model
8.65
Grid tuning with finetune::race_anova()
8.66
Evaluate the model
8.67
Final evauation
8.68
Feature importance
8.69
Meeting Videos
8.69.1
Cohort 1
8.69.2
Cohort 2
8.69.3
Cohort 3
8.69.4
Cohort 4
8.69.5
Cohort 5
9
Support Vector Machines
9.1
Hyperplane
9.2
Separating Hyperplane
9.3
Maximal Margin Classifier
9.4
Mathematics of the MMC
9.5
Support Vector Classifiers
9.6
Mathematics of the SVC
9.7
Tuning Parameter
9.8
Nonlinear Classification
9.9
Support Vector Machines
9.10
Radial Kernels
9.11
SVM with Radial Kernels
9.12
More than Two Classes
9.13
Lab: Support Vector Classifier
9.13.1
Tuning
9.13.2
Linearly separable data
9.14
Meeting Videos
9.14.1
Cohort 1
9.14.2
Cohort 2
9.14.3
Cohort 3
9.14.4
Cohort 4
9.14.5
Cohort 5
10
Deep Learning
10.1
Introduction
10.2
Single Layer Neural Network
10.3
Lab: A Single Layer Network on the Hitters Data
10.4
Multilayer Neural Network
10.5
Convolutional Neural Network
10.6
Recurrent Neural Network
10.7
Backpropagation
10.8
Deep Learning part 2
10.9
Introduction
10.9.1
Multilayer neural networks
10.9.2
Convolutional Neural Networks (CNNs):
10.9.3
Recurrent Neural Networks (RNNs):
10.10
Case Study: RNN - Time Series
10.11
Meeting Videos
10.11.1
Cohort 1
10.11.2
Cohort 2
10.11.3
Cohort 3
10.11.4
Cohort 4
10.11.5
Cohort 5
11
Survival Analysis and Censored Data
11.1
What is survival data?
11.2
Introduction to Survival Analysis (zedstatistics) —
11.3
Censored Data
11.4
Lab: Brain Cancer survival analysis
11.5
Survival Function
11.6
Kaplan-Meier survival curve
11.7
Kaplan-Meier survival curve in R
11.8
KM curve stratified by sex
11.9
Log-Rank test
11.10
Survminer package
11.11
Hazard Function
11.11.1
How is the hazard rate related to the survival probability?
11.12
Regression models
11.13
Proportional Hazards
11.14
Cox Proportional Hazards Model
11.15
Surivival Curves
11.16
Additional Topics Covered in Text:
11.17
Conclusions
11.18
Meeting Videos
11.18.1
Cohort 1
11.18.2
Cohort 2
11.18.3
Cohort 3
11.18.4
Cohort 4
11.18.5
Cohort 5
12
Unsupervised Learning
12.1
Introduction
12.2
Principal component analysis
12.2.1
What are the steps to principal component analysis?
12.3
Geometric interpretation
12.4
Proportion of variance explained
12.5
The matrix decomposition
12.5.1
Matrix Completion
12.6
Clustering
12.7
K-means
12.8
Hierarchical clustering
12.8.1
Consideration on how to interpret Dendrogram results
12.9
References
Meeting Videos
12.9.1
Cohort 1
12.9.2
Cohort 2
12.9.3
Cohort 3
12.9.4
Cohort 4
12.9.5
Cohort 5
13
Multiple Testing
13.1
How to deal with more than one hypothesis test
13.2
Hypothesis testing steps
13.3
m NULL hypotheses
13.4
Family Wise Error Rate (FWER)
13.4.1
Controlling FWER
13.5
Power
13.6
False Discovery Rate (FDR)
13.7
Benjamini-Hochberg procedure
13.8
Case Study: Multiple hypothesis test in Genomics
13.9
Load libraries and datasets
13.10
Multiple T-test
13.10.1
FWER Family Wise Error Rate
13.10.2
FDR False discovery rate
13.11
Replications
13.12
Lab: Multiple Testing
13.13
The Family-Wise Error Rate
13.14
The False Discovery Rate
13.15
A Re-Sampling Approach
13.16
Meeting Videos
13.16.1
Cohort 1
13.16.2
Cohort 2
13.16.3
Cohort 3
13.16.4
Cohort 4
13.16.5
Cohort 5
Abbreviations
Appendix: Bookdown and LaTeX Notes
Markdown highlighting
Text coloring
Section references
Footnotes
Formatting Text
Figures
Displaying Formula
Formatting
Symbols
Notation
Equations
Basic Equation
Case-When Equation (Large Curly Brace)
Alligned with Underbars
Greek letters
Published with bookdown
Introduction to Statistical Learning Using R Book Club
OLS review
\(\hat{\beta}^{OLS} \equiv \underset{\hat{\beta}}{argmin}(\sum_{i=1}^{n}{(y_i - \hat{\beta} - \sum_{k=1}^{p}{\beta_kx_{ik}})^2})\)
\(\hat{\beta}^{OLS} \equiv \underset{\hat{\beta}}{argmin}(RSS)\)