13.4 Latent-data formulation

Alternate formulation using a continuous ‘latent’ variable $z_i$ . It is completely equivalent to the logistic regression.
‘latent’ means unobserved

$$ $\begin{align} y_i &= \begin{cases} 1 & \text{if } z_i > 0 \\ 0 &\text{if } z_i < 0 \\ \end{cases} \\ z_i &= X_i\beta + \epsilon_i \end{align}$ $$

Here the $\epsilon_i$ are independent and have the logistic distribution:

The distribution of the error terms is similar to a Gaussian with $\sigma = 1.6$ . What we relaxed that and use a Gaussian fit $\sigma$ ?
Answer: It wont work because the ‘latent’ scale parameter $\sigma$ is non-identifiable. You can pick any $you want and you can get the same predictions by scaling the slope and intercept!
So why bother?
- In some settings direct information is available for the $z_i$ ’s
- We will see in later chapters this latent formulation can be useful.