13.13 The Family-Wise Error Rate

If the null hypothesis is true for each of $m$ independent hypothesis tests, then the FWER is equal to $1-(1-\alpha)^m$ .

We can use this expression to compute the FWER for $m=1,\ldots, 500$ and $\alpha=0.05$ , $0.01$ , and $0.001$ .

m <- 1:500
fwe1 <- 1 - (1 - 0.05)^m
fwe2 <- 1 - (1 - 0.01)^m
fwe3 <- 1 - (1 - 0.001)^m

We now conduct a one-sample $t$ -test for each of the first five managers in the Fund dataset, in order to test the null hypothesis that the $j$ th fund manager’s mean return equals zero, $H_{0j}: \mu_j=0$ .

library(ISLR2)
fund.mini <- Fund[, 1:5]

t.test(fund.mini[, 1], mu = 0)

## 
##  One Sample t-test
## 
## data:  fund.mini[, 1]
## t = 2.8604, df = 49, p-value = 0.006202
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.8923397 5.1076603
## sample estimates:
## mean of x 
##         3

fund.pvalue <- rep(0, 5)


for (i in 1:5)
  fund.pvalue[i] <- t.test(fund.mini[, i], mu = 0)$p.value

fund.pvalue

## [1] 0.006202355 0.918271152 0.011600983 0.600539601 0.755781508

We will make a correction with Bonferroni’s method and Holm’s method to control the FWER.

To do this, we use the p.adjust() function.

In other words, the adjusted $p$ -values resulting from the p.adjust() function can be compared to the desired FWER in order to determine whether or not to reject each hypothesis.

p.adjust(fund.pvalue, method = "bonferroni")

## [1] 0.03101178 1.00000000 0.05800491 1.00000000 1.00000000

pmin(fund.pvalue * 5, 1)

## [1] 0.03101178 1.00000000 0.05800491 1.00000000 1.00000000

Therefore, using Bonferroni’s method, we are able to reject the null hypothesis only for Manager One while controlling the FWER at $0.05$ .

By contrast, using Holm’s method, the adjusted $p$ -values indicate that we can reject the null hypotheses for Managers One and Three at a FWER of $0.05$ .

p.adjust(fund.pvalue, method = "holm")

## [1] 0.03101178 1.00000000 0.04640393 1.00000000 1.00000000

Manager One performs well, whereas Manager Two has poor performance.

apply(fund.mini, 2, mean)

## Manager1 Manager2 Manager3 Manager4 Manager5 
##      3.0     -0.1      2.8      0.5      0.3

t.test(fund.mini[, 1], fund.mini[, 2], paired = T)

## 
##  Paired t-test
## 
## data:  fund.mini[, 1] and fund.mini[, 2]
## t = 2.128, df = 49, p-value = 0.03839
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.1725378 6.0274622
## sample estimates:
## mean difference 
##             3.1

Here, we use the TukeyHSD() function to apply Tukey’s methodin order to adjust for multiple testing.

conflicted::conflict_prefer("as.matrix", "base")

## [conflicted] Will prefer base::as.matrix over any other package.

returns <- as.vector(base::as.matrix(fund.mini))
manager <- rep(c("1", "2", "3", "4", "5"), rep(50, 5))
a1 <- aov(returns ~ manager)
TukeyHSD(x = a1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = returns ~ manager)
## 
## $manager
##     diff        lwr       upr     p adj
## 2-1 -3.1 -6.9865435 0.7865435 0.1861585
## 3-1 -0.2 -4.0865435 3.6865435 0.9999095
## 4-1 -2.5 -6.3865435 1.3865435 0.3948292
## 5-1 -2.7 -6.5865435 1.1865435 0.3151702
## 3-2  2.9 -0.9865435 6.7865435 0.2452611
## 4-2  0.6 -3.2865435 4.4865435 0.9932010
## 5-2  0.4 -3.4865435 4.2865435 0.9985924
## 4-3 -2.3 -6.1865435 1.5865435 0.4819994
## 5-3 -2.5 -6.3865435 1.3865435 0.3948292
## 5-4 -0.2 -4.0865435 3.6865435 0.9999095

mean(TukeyHSD(x = a1)$manager[,4])

## [1] 0.600986

The TukeyHSD() function provides confidence intervals for the difference between each pair of managers (lwr and upr), as well as a $p$ -value.

All of these quantities have been adjusted for multiple testing.

Let’s plot the confidence intervals for the pairwise comparisons using the plot() function.

plot(TukeyHSD(x = a1))