Modelado Tidy con R - Club de Lectura
Bienvenida
Reuniones del club de lectura
1
Software para modelado
1.1
The pit of success
1.2
Types of models
1.3
Terminology
1.4
The data analysis process
1.5
The modeling process
1.6
Videos de las reuniones
1.6.1
Cohorte 1
2
A tidyverse primer
2.1
Tidyverse design Principles
2.2
Design for Humans - Overview
2.3
Design for Humans and the Tidyverse
2.4
Reusing existing data structures
2.5
Designed for the pipe
2.6
Designed for Functional Programming
2.7
Tibbles vs. Data Frames
2.8
How to read and wrangle data
2.9
Further Reading
2.10
Videos de las reuniones
2.10.1
Cohorte 1
3
A review of R modeling fundamentals
3.1
R formula syntax
3.1.1
Recap
3.2
Inspecting and developing models
3.3
More of
{base}
and
{stats}
3.4
Why Tidy Principles and
{tidymodels}
?
3.5
Videos de las reuniones
3.5.1
Cohorte 1
Revisión de Tidyverse
Usando pipes
Iteraciones en R
En paralelo
Referencias
Videos de las reuniones
Cohorte 1
Basics
4
The Ames housing data
4.1
Pittsburgh: a parallel real world example
4.2
Videos de las reuniones
4.2.1
Cohorte 1
5
Spending our data
5.1
Spending our data
5.2
Common methods for splitting data
5.3
Class imbalance
5.3.1
Stratified sampling simulation
5.4
Continuous outcome data
5.5
Time series data
5.6
Multi-level data
5.7
What proportion should be used?
5.8
Summary
5.8.1
References
5.9
Videos de las reuniones
5.9.1
Cohorte 1
6
Ajustando modelos con
parsnip
6.1
Crear un modelo
6.1.1
Diferentes Interfaces de Modelado
6.1.2
Especificación del Modelo
6.1.3
Ajustando el Modelo
6.1.4
Argumentos de Modelo Generalizados
6.2
Usar los resultados del Modelo
6.3
Haciendo Predicciones
6.4
Paquetes Adjacentes a
{tidymodels}
6.5
Resumen
6.6
Videos de las reuniones
6.6.1
Cohorte 1
AI Ethics
6.7
Videos de las reuniones
6.7.1
Cohorte 1
7
A model workflow
7.1
Workflows
7.2
Demonstration
7.2.1
Some data exploration and cleaning
7.3
Modeling with workflows
7.3.1
Different model, same recipe
7.3.2
Same model, different preprocessing
7.4
Managing many workflows
7.5
Notes
7.6
Videos de las reuniones
7.6.1
Cohorte 1
8
Feature engineering with recipes
8.1
Videos de las reuniones
8.1.1
Cohorte 1
9
Judging model effectiveness
9.1
Measures of Model Fit
9.2
Disclaimers
9.3
Regression Metrics
9.4
Binary Classification Metrics
9.5
References
9.6
Videos de las reuniones
9.6.1
Cohorte 1
Review of chapters 4-9
9.7
Videos de las reuniones
9.7.1
Cohorte 1
Tools for Creating Effective Models
10
Resampling for evaluating performance
10.1
Why?
10.2
Resubstitution approach
10.3
Resampling methods
10.3.1
Cross-validation
10.3.2
Validation sets
10.3.3
Boostrapping
10.3.4
Rolling forecasting origin resampling
10.4
Estimating performance
10.5
Parallel processing
10.6
Saving the resampled objects
10.7
Videos de las reuniones
10.7.1
Cohorte 1
11
Comparing models with resampling
11.1
Calculate performance statistics
11.2
Calculate performance statistics: {workflowsets}
11.3
Within-resample correlation
11.4
Practical effect size
11.5
Simple Comparison
11.6
Bayesian methods
11.7
Videos de las reuniones
11.7.1
Cohorte 1
12
Model tuning and the dangers of overfitting
12.1
What is a Tuning Parameter?
12.1.1
Examples
12.2
When not to tune
12.3
Decisions, Decisions…
12.4
What Metric Should We Use?
12.5
Can we make our model
too
good?
12.6
Tuning Parameter Optimization Strategies
12.7
Tuning Parameters in tidymodels
{dials}
12.8
Let’s try an example:
12.9
Build our random forest model:
12.10
Add tuning parameters:
12.11
Updating tuning parameters:
12.12
Finalizing tuning parameters:
12.13
What is next?
12.14
Videos de las reuniones
12.14.1
Cohorte 1
13
Grid search
13.1
Regular and non-regular grids
13.1.1
Regular Grids
13.1.2
Irregular Grids
13.2
Evaluating the grid
13.3
Finalizing the model
13.4
Tools for efficient grid search
13.4.1
Submodel optimization
13.4.2
Parallel processing
13.4.3
Benchmarking Parallel with boosted trees
13.4.4
Racing Methods
13.5
Chapter Summary
13.6
Videos de las reuniones
13.6.1
Cohorte 1
14
Iterative search
14.1
SVM model as motivating example
14.2
Bayesian Optimization
14.2.1
Gaussian process model, at a high level
14.3
Simulated annealing
14.3.1
How it works
14.3.2
The tune_sim_anneal() function
14.4
References
14.5
Videos de las reuniones
14.5.1
Cohorte 1
15
Screening Many Models
15.1
Obligatory Setup
15.2
Creating
workflow_set
s
15.3
Ranking models
15.4
Finalizing the model
15.5
Videos de las reuniones
15.5.1
Cohorte 1
Review of chapters 10-15
15.6
Videos de las reuniones
15.6.1
Cohorte 1
16
Dimensionality reduction
16.1
{recipes} without {workflows}
16.2
Principal Component Analysis (PCA)
16.3
Partial Least Squares (PLS)
16.4
Independent Component Anysis (ICA)
16.5
Uniform Manifold Approximation and Projection (UMAP)
16.6
Modeling
16.7
Videos de las reuniones
16.7.1
Cohorte 1
Other Topics
17
Encoding categorical data
17.1
Slide 1 Title
17.2
Slide 2 Title
17.3
Videos de las reuniones
17.3.1
Cohorte 1
18
Explaining models and predictions
18.1
Chapter 18 Setup
18.2
Overview
18.3
Local Explanations
18.4
Local Explanations for Interactions
18.5
Global Explanations
18.6
Global Explanations from Local Explanations
18.7
References
18.8
Videos de las reuniones
18.8.1
Cohorte 1
19
When should you trust predictions?
19.1
Equivocal Results
19.2
Model Applicability
19.3
Videos de las reuniones
19.3.1
Cohorte 1
20
Ensembles of models
20.1
Ensembling
20.2
Ensembling with
stacks
!
20.3
Define some models
20.4
Initialize and add members to stack.
20.5
Blend, fit, predict
20.6
Videos de las reuniones
20.6.1
Cohorte 1
21
Inferential analysis
21.1
Dataset used for demonstrating inference
21.2
Tidy method from the {broom} package
21.3
{infer} for simple, high level hypothesis testing
21.3.1
p value for idependence based on simulation with permutation
21.3.2
Confidence interval for correlation based on simulation with bootstrapping
21.3.3
Use theory instead of simulation
21.3.4
Linear models with multiple explanatory variables
21.4
reg_intervals from {rsample}
21.5
Inference with lower level helpers
21.6
Videos de las reuniones
21.6.1
Cohorte 1
Publicado con bookdown
Modelado Tidy con R - Club de Lectura
8.1
Videos de las reuniones
8.1.1
Cohorte 1