• Analyzing Baseball Data with R (3e) Book Club
  • Welcome
    • Book club meetings
    • Pace
  • 1 The Baseball Datasets
    • Overview
    • Baseball terms
    • Lahman Databse
    • Lahman from R
    • Example Uses for Lahman
    • Retrosheet Game-by-Game Data
    • Retrosheet Play-by-Play Data
    • Accessing the data
    • Pitch-by-Pitch Data
    • Statcast
    • Other data on baseballr
    • Data used in the book
    • Exercises
      • Exercise 1:
  • 2 Introduction to R
    • 2.1 Downloading and using R
    • 2.2 Tidyverse
    • 2.3 Data Frames
      • 2.3.1 Manipulations with Data
    • 2.4 Vectors
    • 2.5 Objects and Containers in R
    • 2.6 Collection of R Commands
    • 2.7 Reading and Writing Data
    • 2.8 Packages
    • 2.9 Splitting, Applying, and Combining Data
    • 2.10 Getting Help
    • 2.11 Further Reading
  • 3 Graphics
    • SLIDE 1
  • 4 The Relation Between Runs and Wins
    • 4.1 Recency
    • 4.2 Shortened Seasons
      • 4.2.1 Fun Fact!
    • 4.3 Rate Statistics
      • 4.3.1 Win Percentage
    • 4.4 Correlation
      • 4.4.1 Offense
      • 4.4.2 Defense
      • 4.4.3 Run Differential
    • 4.5 Linear Regression
      • 4.5.1 Residuals
      • 4.5.2 Discussion
    • 4.6 Pythagorean Formula
      • 4.6.1 What should the exponent be?
      • 4.6.2 Luck
    • 4.7 Case Studies
      • 4.7.1 2011 Red Sox
      • 4.7.2 Clutch Performance
      • 4.7.3 Great Relievers
    • 4.8 How Many Runs for a Win?
      • 4.8.1 Calculus
      • 4.8.2 Incremental Runs per Win
    • 4.9 Exercises
      • 4.9.1 Exercise 4.1
      • 4.9.2 Exercise 4.2
  • 5 Value of Plays Using Run Expectancy
    • 5.1 Run Expectancy Matrix
    • 5.2 Runs Scored in the Remainder of the Inning
    • 5.3 Creating the Matrix
    • 5.4 Measuring Success of a Batting Play
    • 5.5 José Altuve
    • 5.6 Opportunity and Success for All Hitters
    • 5.7 Position in the Batting Lineup
    • 5.8 Value of a home run
    • 5.9 Value of a single
    • 5.10 Value of Base Stealing
  • 6 Balls and Strikes Effects
    • SLIDE 1
  • 7 Catcher Framing
    • 7.1 Background
    • 7.2 Framing Examples
    • 7.3 Getting the data
    • 7.4 Where is the Strike Zone?
    • 7.5 Modeling Called Strike Percentage
    • 7.6 Modeling Catcher Framing
    • 7.7 Further Reading
  • 8 Career Trajectories
    • 8.1 Mickey Mantle’s trajectory - Warm up
    • Fit to parabola
    • Fit (Mantle)
    • Plot it
    • Full fit summary
    • 8.2 Comparing Trajectories
    • Setting up the data
    • Compute Similarity Score
    • Example
    • Compute Age and OPS for all players / seasons
    • Fit and plot trajectories
    • Mickey Mantle
    • Derek Jeter
    • Sumarize by peak Age and curvature
    • 8.3 General Patterns of Peak Age
    • Data preperation
    • Patterns of peak age over time
    • Peak age and career at-bats
    • 8.4 Fielding Position
    • 8.5 Discussion points
  • 9 Simulation
    • 9.1 Setup
      • 9.1.1 Retrieve situation states
      • 9.1.2 Sum runs and ID half innings
      • 9.1.3 Meaningful plays
      • 9.1.4 End of innings
    • 9.2 Transition Matrices
      • 9.2.1 Transition states
      • 9.2.2 Absorbing states
      • 9.2.3 Examples
    • 9.3 Tracking Runs Scored
    • 9.4 Simulate Half-Inning
      • 9.4.1 Many Iterations
      • 9.4.2 All baserunner-outs states
    • 9.5 Stochastic Processes
      • 9.5.1 Multiple Transitions
      • 9.5.2 Fundamental Matrix
      • 9.5.3 Visit Frequency
    • 9.6 For Individual Teams
      • 9.6.1 Toward NOBLETIGER
      • 9.6.2 Smoothing Operation
    • 9.7 Team Talent
      • 9.7.1 Bill James \(\log_{5}\) model
      • 9.7.2 Bradley-Terry Model
    • 9.8 Make a Schedule
    • 9.9 Compute Win Probabilities
    • 9.10 Simulate Season
      • 9.10.1 Standings
      • 9.10.2 Simulate World Series
    • 9.11 Simulate Many Seasons
      • 9.11.1 Parity
  • 10 Exploring Streaky Performances
    • Introduction
    • The Great Streak
      • 10.0.1 Moving Batting Averages
    • Streaks in Individual At-Bats
    • 10.1 Moving batting averages
    • 10.2 Finding slumps for all players
    • 10.3 Were Ichiro and Mike Trout unusually streaky?
    • Local Patterns of Statcast Launch Velocity
  • 11 Using a Database to Compute Park Factors
    • 11.1 Introduction
    • 11.2 Connecting R with MySQL using PostgreSQL
    • 11.3 Filling a MySQL Game Log Database from R
    • 11.4 From R to MySQL
    • 11.5 Downloading retrosheet files from 1995 to 2017
    • 11.6 SQL
    • 11.7 Querying Data from R
    • 11.8 Data cleaning
    • 11.9 Coors Field and run scoring
    • 11.10 Calculating Basic Park Factors
    • 11.11 Home run park factor
    • 11.12 Applying park factors
    • 11.13 Exercises
      • 11.13.1 1. Runs Scored at the Astrodome
      • 11.13.2 2. Draw a plot to visually compare through the years the runs scored (both teams combined) in games played at the Astrodome and in other ballparks.
  • 12 Working with Large Data
    • 12.1 Introduction
    • 12.2 Acquiring a Year’s Worth of Statcast Data
    • 12.3 Storing Large Data Efficiently
    • 12.4 Using R’s internal data format
    • 12.5 Using Apache Arrow and Apache Parquet
    • 12.6 Using DuckDB
    • 12.7 Performance Comparison
      • 12.7.1 Computational speed
      • 12.7.2 Memory footprint
      • 12.7.3 Disk storage footprint
      • 12.7.4 Overall guidelines
    • 12.8 Launch Angles and Exit Velocities, Revisited
      • 12.8.1 Launches angles over time
    • 12.9 Further reading
  • 13 Home Run Hitting
    • Getting the Data
    • Code for creating the data file
    • Read in data
    • Home runs and launch variables
    • Plot of model
    • Optimal launch angle?
    • Temperature effects
    • Spray angle
    • Home runs spray vs batting side
    • Ball park effects (reprise)
    • Park Factor Plot
    • Pitcher or batter ?
    • Fitting the model
    • Comparing Home run hitting accross seasons
    • Definitions
    • Binning launch variables
    • Function to plot results
    • Example 2023 HR by bin
    • 2023 HR Rate by Bin
    • Compare between seasons
    • Difference in BIP logits
    • Changes in carry?
    • Interpretation
  • 14 Making a Scientific Presentation using Quarto
    • 14.1 Sections
    • 14.2 Math (LaTeX)
    • 14.3 Columns
    • 14.4 Tabsets
    • 14.5 Callout Boxes
      • 14.5.1 Collapsible Callout Boxes
    • 14.6 Slideshows
  • 15 Using Shiny for Baseball Applications
    • Why Shiny
    • Basics of Shiny
    • Examples
    • More resources
  • Published with bookdown

Analyzing Baseball Data with R (3e) Book Club

2.11 Further Reading

  • R for Data Science (2e)
  • Modern Data Science with R $$
  • A Modern Dive into R and the Tidyverse
  • R for the Rest of Us
  • R Packages
  • Happy Git and GitHub for the useR