4.6 Pythagorean Formula

Bill James, regarded as the godfather of sabermetrics, empirically derived the following non-linear formula to estimate winning percentage, called the Pythagorean expectation.

\[\widehat{W_{\text{pct}}} = \frac{R^{2}}{R^{2} + {RA^{2}}}\]

ch4_data <- ch4_data |>
  mutate(Wpct_pyt = R^2 / (R^2 + RA^2),
         resid_pyt = Wpct - Wpct_pyt)

# RMSE with exponent of 2
sqrt(mean(ch4_data$resid_pyt^2))
## [1] 0.02570405

4.6.1 What should the exponent be?

\[\frac{W}{W+L} = W_{\text{pct}} \approx \widehat{W_{\text{pct}}} = \frac{R^{k}}{R^{k} + {RA^{k}}}\]

algebra

\[\begin{array}{rcl} \frac{W}{W+L} = W_{\text{pct}} & \approx & \widehat{W_{\text{pct}}} = \frac{R^{k}}{R^{k} + {RA^{k}}} \\ \frac{W}{W+L} & \approx & \frac{R^{k}}{R^{k} + {RA^{k}}} \\ WR^{k} + WRA^{k} & \approx & WR^{k} + LR^{k} \\ WRA^{k} & \approx & LR^{k} \\ \frac{W}{L}\cdot RA^{k} & \approx & R^{k} \\ \frac{W}{L} & \approx & \frac{R^{k}}{RA^{k}} \\ \frac{W}{L} & \approx & \left(\frac{R}{RA}\right)^{k} \\ \ln\frac{W}{L} & \approx & \ln\left(\frac{R}{RA}\right)^{k} \\ \end{array}\]

\[\ln\frac{W}{L} \approx k\ln\left(\frac{R}{RA}\right)\]

ch4_data <- ch4_data |>
  mutate(logWratio = log(W/L),
         logRratio = log(R/RA))
pyt_fit <- lm(logWratio ~ 0 + logRratio, data = ch4_data)
pyt_fit$coefficients
## logRratio 
##  1.834988
ch4_data <- ch4_data |>
  mutate(Wpct_pyt = R^1.835 / (R^1.835 + RA^1.835),
         resid_pyt = Wpct - Wpct_pyt)

# RMSE with exponent of 1.835
sqrt(mean(ch4_data$resid_pyt^2))
## [1] 0.02494779

4.6.2 Luck

We can find the expected number of wins for a full season by multiplying the estimated win percentage (from the Pythagorean formula with an exponent of 1.835) by 162 games.

2011 Season
Performance vs Pythag Expectation
teamID W W_pyt playoff_bool diff desc
DET 95 88.5 made playoffs 6.5 lucky
SFN 86 80.0 missed playoffs 6.0 lucky
MIL 96 90.1 made playoffs 5.9 lucky
ARI 94 88.3 made playoffs 5.7 lucky
CLE 80 75.3 missed playoffs 4.7 lucky
ATL 89 85.3 missed playoffs 3.7 lucky
CHA 79 75.3 missed playoffs 3.7 lucky
PIT 72 69.6 missed playoffs 2.4 lucky
BAL 69 66.7 missed playoffs 2.3 lucky
SLN 90 88.1 made playoffs 1.9 lucky
TOR 81 79.2 missed playoffs 1.8 lucky
WAS 80 78.8 missed playoffs 1.2 lucky
LAA 86 84.9 missed playoffs 1.1 lucky
MIN 63 61.9 missed playoffs 1.1 lucky
CHN 71 70.3 missed playoffs 0.7 lucky
SEA 67 66.7 missed playoffs 0.3 lucky
FLO 72 72.4 missed playoffs −0.4 unlucky
TBA 91 91.4 made playoffs −0.4 unlucky
PHI 102 102.6 made playoffs −0.6 unlucky
NYN 77 78.6 missed playoffs −1.6 unlucky
TEX 96 98.1 made playoffs −2.1 unlucky
LAN 82 84.8 missed playoffs −2.8 unlucky
OAK 74 77.2 missed playoffs −3.2 unlucky
CIN 79 82.5 missed playoffs −3.5 unlucky
BOS 90 93.7 missed playoffs −3.7 unlucky
COL 73 77.2 missed playoffs −4.2 unlucky
NYA 97 101.2 made playoffs −4.2 unlucky
HOU 56 62.2 missed playoffs −6.2 unlucky
KCA 71 77.8 missed playoffs −6.8 unlucky
SDN 71 78.8 missed playoffs −7.8 unlucky
table code
ch4_data |>
  filter(yearID == 2011) |>
  mutate(W_pyt = Wpct_pyt*162) |>
  select(teamID, W, W_pyt, playoff_bool) |>
  mutate(diff = W - W_pyt) |>
  mutate(desc = ifelse(diff > 0,
                       "lucky", "unlucky")) |>
  arrange(desc(diff)) |>
  gt() |>
  cols_align(align = "center") |>
  data_color(columns = diff,
             palette = "viridis") |>
  data_color(columns = playoff_bool,
             palette = "viridis",
             reverse = TRUE) |>
  fmt_number(columns = c(W_pyt, diff),
             decimals = 1) |>
  tab_header(title = "2011 Season",
             subtitle = "Performance vs Pythag Expectation")
2023 Season
Performance vs Pythag Expectation
teamID W W_pyt playoff_bool diff desc
MIA 84 74.9 made playoffs 9.1 lucky
BAL 101 93.8 made playoffs 7.2 lucky
DET 78 72.6 missed playoffs 5.4 lucky
PIT 76 71.2 missed playoffs 4.8 lucky
CIN 82 77.5 missed playoffs 4.5 lucky
ARI 84 79.5 made playoffs 4.5 lucky
WAS 71 67.1 missed playoffs 3.9 lucky
NYA 82 78.3 missed playoffs 3.7 lucky
SFN 79 76.2 missed playoffs 2.8 lucky
ATL 104 101.3 made playoffs 2.7 lucky
MIL 92 89.7 made playoffs 2.3 lucky
OAK 50 48.9 missed playoffs 1.1 lucky
PHI 90 88.9 made playoffs 1.1 lucky
SLN 71 70.5 missed playoffs 0.5 lucky
LAA 73 72.5 missed playoffs 0.5 lucky
TOR 89 88.8 made playoffs 0.2 lucky
LAN 100 99.9 made playoffs 0.1 lucky
CHA 61 61.2 missed playoffs −0.2 unlucky
TBA 99 99.8 made playoffs −0.8 unlucky
CLE 76 77.2 missed playoffs −1.2 unlucky
COL 59 60.4 missed playoffs −1.4 unlucky
BOS 78 80.6 missed playoffs −2.6 unlucky
SEA 88 91.3 missed playoffs −3.3 unlucky
HOU 90 93.5 made playoffs −3.5 unlucky
NYN 75 79.8 missed playoffs −4.8 unlucky
TEX 90 96.2 made playoffs −6.2 unlucky
MIN 87 93.2 made playoffs −6.2 unlucky
CHN 83 90.2 missed playoffs −7.2 unlucky
KCA 56 63.5 missed playoffs −7.5 unlucky
SDN 82 92.0 missed playoffs −10.0 unlucky