4.2 Why NOT Linear Regression?

  • a regression method cannot convert a qualitative response variable with more than two levels into a quantitative response that is ready for linear regression

\[Y = \left\{ \begin{array}{ll} 1 & \mbox{if stroke};\\ 2 & \mbox{if epileptic seizure};\\ 3 & \mbox{if drug overdose}.\end{array} \right.\]

  • Depending on the complexity of the problem, a regression method will not provide meaningful estimates of Pr(Y |X);

  • There are times that a binary qualitative responses can be modeled using dummy variables approach. Example:

\[Y = \left\{ \begin{array}{ll} 0 & \mbox{if stroke};\\ 1 & \mbox{if drug overdose}.\end{array} \right.\]

  • in such cases, the prediction of \(\hat{Y} > 0.5\), can be associated with .
  • The main issue is partial estimates might be outside the [0, 1] probability interval, e.g. fig4-2:
Classification using the Default data. Left: Estimated probability of default using linear regression. Some estimated probabilities are negative! The orange ticks indicate the 0/1 values coded for default(No or Yes). Right: Predicted probabilities of default using logistic regression. All probabilities lie between 0 and 1.

Figure 4.2: Classification using the Default data. Left: Estimated probability of default using linear regression. Some estimated probabilities are negative! The orange ticks indicate the 0/1 values coded for default(No or Yes). Right: Predicted probabilities of default using logistic regression. All probabilities lie between 0 and 1.