Estimating a difference = regressing on an indicator variable
Add a new group:
n_1 <- 30
y_1 <- rnorm(n_1, 8.0, 5.0)
diff <- mean(y_1) - mean(y_0)
se_0 <- sd(y_0)/sqrt(n_0)
se_1 <- sd(y_1)/sqrt(n_1)
se <- sqrt(se_0^2 + se_1^2)
cat(paste0("Diff: ",diff," Se: ", se))
## Diff: 4.76270151940131 Se: 1.72310260696003
Compare to true difference of 6.0
As a regression (again with flat priors):
n <- n_0 + n_1
y <- c(y_0, y_1)
x <- c(rep(0, n_0), rep(1, n_1))
fake <- data.frame(x, y)
fit <- stan_glm(y ~ x, data=fake, prior_intercept=NULL, prior=NULL, prior_aux=NULL, refresh=0)
print(fit, detail=FALSE)
## Median MAD_SD
## (Intercept) 2.9 1.3
## x 4.8 1.7
##
## Auxiliary parameter(s):
## Median MAD_SD
## sigma 5.7 0.6
Indicator slope (4.8) is the same as the difference in means
MAD_SD (1.5) is nearly the same as the the SE
Fake data simulation is a general tool that will continue to be helpful in more complicated settings.