6.5 Exercises

6.5.1 Exercise 7

We take our model as :

\[ y_i = \beta_0 + \sum_{j=1}^{p} x_{ij} \beta_j + \epsilon_i \]

Here the \(\epsilon_i\) are IID from a normal random distrubtion \(N(0,\sigma^2)\) The likelihood is simply a product of normal distributions with mean \(\mu_i = \beta_0 + \sum_{j=1}^{p} x_{ij} \beta_j\) and standard deviation \(\sigma\) :

\[ L \propto e^{-\frac{1}{2\sigma^2}\sum_i{(y_i - (\beta_0 + \sum_{j=1}^{p} x_{ij} \beta_j))^2} }\] we only care about the parts that depends on the \(\beta_i\) so dont worry about the normalization.

The posterior is simply proportional to the product of \(L\) and the prior

\[ P(\beta | Data) \propto P(Data | \beta) P(\beta)\]

\[ P(\beta | Data) \propto e^{-\frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} } \prod_{j=1}^{p}e^{-\vert \beta_i \rvert/b}\] again dropping any constants of proportionality that do not depend on the parameters.

Now combine the exponentials:

\[ P(\beta | Data) \propto e^{-\frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} -\sum_{j=1}^{p}\vert \beta_i \rvert/b}\]

The mode of this distribution is the value for the \(\beta_i\) for which the exponent is maximized, which means to find the mode we need to minimize:

\[ \frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} + \sum_{j=1}^{p}\vert \beta_i \rvert/b\] or after multiplying through by \(2 \sigma^2\)

\[ \sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} + \sum_{j=1}^{p} 2\sigma^2 \vert \beta_i \rvert/b\]

This is the same form as 6.7

I think it should be clear that if you work throuhg the exact same steps with prior (for each \(\beta_i\)) \(e^{-\frac{\beta_i^2}{2 c}}\) you end up with the posterior:

\[ e^{- \frac{1}{2\sigma^2}\sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} -\sum_{j=1}^{p} \frac{\beta_i^2}{2 c}}\]

And to find the to find the mode of the posterior, to finding the minimum of:

\[ \sum_i{(y_i - \beta_0 - \sum_{j=1}^{p} x_{ij} \beta_j)^2} + \sum_{j=1}^{p} \frac{\sigma^2}{c}\beta_i^2\] which is of the same form as 6.5. That this mode is also the mean follows since the posterior in this case is a multinormal distribution in \(\beta_i\) (it’s quadratic)