4.3 Logistic Regression

4.3.1 The Logistic Model

Logistic regression: models the probability that Y belongs to a particular category (X)
X is binary (0/1)

$p(X) = β_0 + β_1X \space \Longrightarrow {Linear \space regression}$ $p (X) = \frac{e^{\beta_{0} + \beta_{1}X}}{1 + e^{\beta_{0} + \beta_{1}X}} \space \Longrightarrow {Logistic \space function}$ $odds = \frac{p (X)}{1 - p (X)} = e^{\beta_{0} + \beta_{1}X} \Longrightarrow {odds \space value [0, ∞]}$

By logging the whole equation, we get

$\log \biggl(\frac{p(X)}{1- p(X)}\bigg) = \beta_{0} + \beta_{1}X \Longrightarrow {log \space odds/logit}$

4.3.2 Estimating the Regression Coefficient

To estimate the regression coefficient, we use maximum likelihood (ME).

Likelihood Function

$ℓ (\beta_{0}, \beta_{1}) = \prod_{i: y_{i}= 1} p (x_i) \prod_{i': y_{i'}= 0} (1- p (x_{i'})) \Longrightarrow {Likelihood \space function}$

The aim is to find beta values such that $ℓ$ is maximum.
The Least square method is the special case of maximum likelihood function.

4.3.3 Multiple Logistic Regression

$\log \biggl(\frac{p(X)}{1- p(X)}\bigg) = \beta_{0} + \beta_{1}X_1 + ... + \beta_{p}X_p \\ \Downarrow \\ p(X) = \frac{e^{\beta_{0} + \beta_{1}X_1 + ... + \beta_{p}X_p}}{1 + \beta_{0} + \beta_{1}X_1 + ... + \beta_{p}X_p}$

Confounding in the Default data. Left: Default rates are shown for students (orange) and non-students (blue). The solid lines display default rate as a function of balance, while the horizontal broken lines display the overall default rates. Right: Boxplots of balance for students (orange) and non-students (blue) are shown.

Figure 4.3: Confounding in the Default data. Left: Default rates are shown for students (orange) and non-students (blue). The solid lines display default rate as a function of balance, while the horizontal broken lines display the overall default rates. Right: Boxplots of balance for students (orange) and non-students (blue) are shown.

4.3.4 Multinomial Logistic Regression

This is used in the setting where K > 2 classes. In multinomial, we select a single class to serve as the baseline.
However, the interpretation of the coefficients in a multinomial logistic regression model must be done with care, since it is tied to the choice of baseline.
Alternatively, you can use `Softmax coding, where we treat all K classes symmetrically, and assume that for k = 1, . . . ,K, rather than selecting a baseline. This means, we estimate coefficients for all K classes, rather than estimating coefficients for K − 1 classes.