6.5 Exercises

6.5.1 Exercise 7

We take our model as :

$y_i = \beta_0 + \sum_{j=1}^{p} x_{ij} \beta_j + \epsilon_i$

Here the $\epsilon_i$ are IID from a normal random distrubtion $N(0,\sigma^2)$ The likelihood is simply a product of normal distributions with mean $\mu_i = \beta_0 + \sum_{j=1}^{p} x_{ij} \beta_j$ and standard deviation $\sigma$ :

$L \propto e^{-\frac{1}{2\sigma^2}\sum_i{(y_i - (\beta_0 + \sum_{j=1}^{p} x_{ij} \beta_j))^2} }$ we only care about the parts that depends on the $\beta_i$ so dont worry about the normalization.

The posterior is simply proportional to the product of $L$ and the prior

$P(\beta | Data) \propto P(Data | \beta) P(\beta)$

$P(\beta | Data) \propto e^{-\frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} } \prod_{j=1}^{p}e^{-\vert \beta_i \rvert/b}$ again dropping any constants of proportionality that do not depend on the parameters.

Now combine the exponentials:

$P(\beta | Data) \propto e^{-\frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} -\sum_{j=1}^{p}\vert \beta_i \rvert/b}$

The mode of this distribution is the value for the $\beta_i$ for which the exponent is maximized, which means to find the mode we need to minimize:

$\frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} + \sum_{j=1}^{p}\vert \beta_i \rvert/b$ or after multiplying through by $2 \sigma^2$

$\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} + \sum_{j=1}^{p} 2\sigma^2 \vert \beta_i \rvert/b$

This is the same form as 6.7

I think it should be clear that if you work throuhg the exact same steps with prior (for each $\beta_i$ ) $e^{-\frac{\beta_i^2}{2 c}}$ you end up with the posterior:

$e^{- \frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} -\sum_{j=1}^{p} \frac{\beta_i^2}{2 c}}$

And to find the to find the mode of the posterior, to finding the minimum of:

$\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} + \sum_{j=1}^{p} \frac{\sigma^2}{c}\beta_i^2$ which is of the same form as 6.5. That this mode is also the mean follows since the posterior in this case is a multinormal distribution in $\beta_i$ (it’s quadratic)