The Great Streak
- play-by-play now available on Retrosheet and downloadable via baseballr package
- we gather our data from Baseball Reference and book’s package
library(abdwr3edata)
joe <- dimaggio_1941
# Check DiMaggio's batting average for the season
joe |> dplyr::summarize(AVG = sum(H) / sum(AB))
## # A tibble: 1 × 1
## AVG
## <dbl>
## 1 0.357
A hitting streak is commonly defined as the number of consecutive games in which a player gets at least one base hit. - Analyzing Baseball Data with R
However…
A consecutive hitting streak shall not be terminated if a batter’s plate appearance results in a base on balls, hit batsman, defensive interference or obstruction or a sacrifice bunt. A sacrifice fly shall terminate the streak - MLB Official Rulebook
## [1] 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1
## [112] 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1
What if we wanted to calculate all the hitting streaks for a particular player?
streaks <- function(y) {
x <- rle(y)
class(x) <- "list"
tibble::as_tibble(x)
}
joe |>
dplyr::pull(had_hit) |>
streaks() |>
dplyr::filter(values == 1) |>
dplyr::pull(lengths)
## [1] 8 3 2 1 3 56 16 4 2 4 7 1 5 2
We can also find streaks of no hits. DiMaggio’s longest in 1941 was only 3 games
## [1] 3 1 2 3 2 1 2 2 3 3 1 1 1
10.0.1 Moving Batting Averages
- Suppose we are interested in a player’s batting average over a moving window (e.g. a stretch of 10 games 1-10, 2-11, 3-12…)
- We create a new function
moving_average
featuringrollmean()
androllsum()
from the zoo package
# transmute is superseded because you can perform the same job with mutate(.keep = "none").
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
moving_average <- function(df, width) {
N <- nrow(df)
df |>
dplyr::transmute(
Game = zoo::rollmean(1:N, k = width, fill = NA),
Average = zoo::rollsum(H, width, fill = NA) /
rollsum(AB, width, fill = NA)
)
}
We then calculate a moving average for a 10 game window for Joe DiMaggio and plot this along with his season average as well as games where he recorded a hit.
joe_ma <- moving_average(joe, 10)
ggplot2::ggplot(joe_ma, ggplot2::aes(Game, Average)) +
ggplot2::geom_line() +
ggplot2::geom_hline(
data = dplyr::summarize(joe, bavg = sum(H)/sum(AB)),
ggplot2::aes(yintercept = bavg), color = "red"
) +
ggplot2::geom_rug(
data = dplyr::filter(joe, had_hit == 1),
ggplot2::aes(Rk, .3 * had_hit), sides = "b",
color = "blue"
)
## Warning: Removed 9 rows containing missing values or values outside the scale range
## (`geom_line()`).