Feature Engineering and Selection Book Club
Welcome
Book club meetings
Pace
1
Introduction
1.1
Structure of the book
1.2
Good Practice guidelines
1.2.1
What is feature engineering
1.2.2
Nature of modeling
1.3
A model with two predictors
1.4
Important concepts
1.4.1
Acknowledge vulnerabilities
1.4.2
The
Modeling process
1.5
Predicting ridership on Chicago
1.5.1
Extra Resources
1.6
Meeting Videos
1.6.1
Cohort 1
2
Illustrative Example: Predicting Risk of Ischemic Stroke
2.1
Introduction
2.2
Example 1
2.2.1
Predictor Quality
2.2.2
understanding interactions and multicollinearity
2.3
Example 2
2.3.1
Create a “null model” with no predictors to get baseline performance
2.3.2
With Tidymodels
2.3.3
Interaction exploration
2.4
Meeting Videos
2.4.1
Cohort 1
3
A Review of the Predictive Modeling Process
3.1
SLIDE 1
3.2
Meeting Videos
3.2.1
Cohort 1
4
Exploratory Visualizations
4.1
Data Visualization Chart
4.2
Introduction to the Chicago Train Ridership Dataset
4.3
Chicago Train Ridership dataset
4.4
Preliminary exploratory visualizations
4.5
Visualizations for Numeric Data
4.5.1
Box Plots, Violin Plots, and Histograms
4.5.2
Augmenting Visualizations through Faceting, Colors, and Shapes
4.5.3
Scatterplots
4.5.4
Scatterplots - Exclude U.S. holidays
4.5.5
Heatmaps
4.5.6
Correlation Matrix Plots
4.5.7
Line plots
4.5.8
Principal Component Analysis (PCA)
4.6
Visualizations for Categorical Data: Exploring the OKCupid dataset
4.6.1
Visualizing Relationships between Outcomes and Predictors
4.6.2
Exploring Relationships Between Categorical Predictors
4.7
Post Modeling Exploratory Visualizations
4.8
Residual Diagnostic Plots
4.9
Meeting Videos
4.9.1
Cohort 1
5
Encoding Categorical Predictors
5.1
Creating Dummy Variables for Unordered Categories
5.2
Encoding Predictors for Many Categories
5.3
Approaches for Novel Categories
5.4
Supervised Encoding Methods
5.4.1
Likelihood Encoding
5.5
Encodings for Ordered Data
5.6
Creating Features for Text Data
5.7
Factors versus Dummy Variables in Tree-Based Models
5.8
Summary
5.9
Meeting Videos
5.9.1
Cohort 1
6
Engineering Numeric Predictors
6.1
Problematic Characteristics of predictors
6.1.1
Dealing with Skewed Data
6.1.2
Standardizing
6.2
Expanding Numeric Transformations
6.2.1
Nonlinear Features via Basis Expansions and Splines
6.2.2
Discretize Predictors as a Last Resort
6.3
New Features from Multiple predictors
6.3.1
Linear Projection Methods
6.4
Meeting Videos
6.4.1
Cohort 1
7
Detecting Interaction Effects
7.1
Introduction
7.2
Four type of Interactions
7.2.1
Building the base-model for
Ames
data
7.3
Guiding Principles in the Search for Interactions
7.4
Practical Considerations
7.5
The Brute-Force Approach to Identifying Predictive Interactions
7.5.1
Simple Screening
7.5.2
Penalized Regression
7.5.3
Practical example with Ames data and glmnet
7.6
Approaches when Complete Enumeration is Practically Impossible
7.6.1
Guiding Principles and Two-stage Modeling
7.6.2
Tree-based Methods
7.6.3
The Feasible Solution Algorithm
7.7
Other Potentially Useful Tools
7.8
Conclusion
7.9
Meeting Videos
7.9.1
Cohort 1
8
Handling Missing Data
8.1
Types of missing data
8.2
Missing data mechanisms
8.3
Statistical Rethinking (Bayesian) Chapter 20 - Missing Data & Other Opportunities
8.4
Why detecting the missing data mechanism is important?
8.5
Visualizing Missing Information
8.6
Exploring pairwise relationships between predictors
8.7
Missing Values for the Chicago ridership data
8.8
Missing data patterns for stations originally in the Chicago ridership data
8.9
Models that are Resistant to Missing Values
8.10
Deletion of Data
8.11
Encoding Missingness
8.12
Imputation methods
8.13
Summary
8.14
Meeting Videos
8.14.1
Cohort 1
9
Working with Profile Data
9.1
Illustrative Data: Pharmaceutical Manufacturing Monitoring
9.1.1
Introduction
9.1.2
IMPORTANT DEFINITIONS !
9.1.3
Preliminary Results
9.2
What are the Experimental Unit and the Unit of Prediction?
9.3
Reducing Background
9.4
Reducing Other Noise
9.5
Exploiting Correlation
9.5.1
Altogether~
9.6
Impacts of Data Processing on Modeling
9.6.1
Cross Validation
9.6.2
Model Selection
9.7
Summary
9.8
Meeting Videos
9.8.1
Cohort 1
10
Feature Selection Overview
10.1
Introduction
10.2
Feature selection
10.3
Classes
10.4
Irrelevant features
10.5
Overfitting
10.6
A case study
10.7
Conclusion
10.8
Meeting Videos
10.8.1
Cohort 1
11
Greedy Search Methods
11.1 Parkinson’s Disease Data
11.2 Simple Filters
Real Data is complex
Converting to p-values
Issues with Simple Filters
Parkinson’s Disease Data
Summarizing Simple Filters
11.3 Recursive Feature Elimination
11.4 Step-wise Selection
How does it work?
Why is Step-Wise Selection Ungood?
Step-wise selection has two primary faults:
Step-wise Selection Example
“Our recommendation is to avoid this procedure altogether.”
Meeting Videos
11.0.1
Cohort 1
12
Global Search Methods
12.1
12.1 Naive Bayes Models
12.1.1
Computing the joint-likelihoods
12.1.2
Major Draw Backs
12.2
12.2 Simulated Annealing
12.2.1
Selecting Features without Overfitting
12.2.2
12.2.2 Application to Modeling the OkCupid Data
12.2.3
12.2.3 Examining Changes in Performance
12.2.4
12.2.4 Grouped Qualitative Predictors Versus Indicator Variables
12.2.5
12.2.5 The Effect of the Initial Subset
12.3
12.3 Genetic Algorithms
12.3.1
12.3.1 External Validation
12.3.2
12.3.2 Coercing Sparsity
12.4
12.5 Summary
12.5
Meeting Videos
12.5.1
Cohort 1
Published with bookdown
Feature Engineering and Selection Book Club
Why is Step-Wise Selection Ungood?