Correlation and independance

  • The Correlation coefficient is defined as:

\[ \rho = \frac{Cov(X,Y)}{\sqrt{Var[X]Var[Y]}} \]

  • If X and Y are independent, then the book proves they are uncorrelated:

\[ Cov(X,Y) = 0 \\ \text{and so} \\ \mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y] \]

  • Note that this is “one way door”: Covariance can be zero for dependent variables.

  • Computing from data:

\[ \hat{\rho} = \frac{\frac{1}{N}\sum_{n=1}^N x_n y_n - \bar{x}\bar{y}}{\sqrt{\frac{1}{N}\sum_{n=1}^N(x_n-\bar{x})^2}\sqrt{\frac{1}{N}\sum_{n=1}^N(y_n-\bar{y})^2}} \]

sigma <- matrix(c(3,1,1,1),nrow=2)

m <- mvrnorm(n=10000, mu=c(0,0), Sigma = sigma)
data <- tibble(x = m[,1], y=m[,2])
print(cor(data$x,data$y))
## [1] 0.5746705
1/sqrt(3)
## [1] 0.5773503
data |> ggplot(aes(x=x,y=y))+geom_point()