9.3 Building the regression model

9.3.1 Data model

We will have n data pairs of bike ridership (\(Y\)) and temperature (\(X\)) :

\[\{(Y_{1}, X_{1}), (Y_{2}, X_{2}), ..., (Y_{n}, X_{n}) \}\]

Here prior knowledge suggest positive linear relationship between ridership and temperature: the warmer it is, the more likely people are using bike share service.

We are now moving away from the global mean (\(\mu\)) to local mean (\(\mu_{i}\), where \(i\) is one day). If the relationship is linear :

\[ \mu_{i} = \beta_{0} + \beta_{1}X_{i} \]

\(\beta_{0}\) is the intercept coefficent but it is hard to interpret (would you rent bike when it is 0 degree F?)

\(\beta_{1}\) is the Temperature coefficient it indicates the typical change in ridership for every one unit increase in temperature. In case we have just one quantitative predictor it is called the slope.

We can plunk this assumption in our model :

\[ Y_{i}| \beta_{0}, \beta_{1}, \sigma \overset{ind}{\sim} N(\mu_{i}, \sigma^2) \; with \quad \mu_{i} = \beta_{0} + \beta_{1}X_{i} \]

As you can see \(\sigma\) is now about variability about the local mean

9.3.2 Normal regression assumptions

  • Structure of the data: accounting for \(X\), \(Y\) for one day is independent of an other day

  • Structure of the relationship: Y can be written as a linear function of predictor X : \(\mu = \beta_{0} + \beta_{1}X\)

  • Structure of the variability: at any value of X, Y will vary normally around \(\mu\) with a consistent standard deviation \(\sigma\)

9.3.3 Specifying the priors

Quiz: What are our parameters ?

Results

\[\beta_{0}, \beta_{1}, \sigma\]

First assumption our parameters are independent

\[ \beta_{0} \sim N(m_{0}, s^2_{0} ) \]

\[ \beta_{1} \sim N(m_{1}, s^2_{1} ) \]

\(m_{0}, m_{1}, s_{0}, s_{1}\) are parameters of parameters so they are hyperparameters

\[\sigma \sim Exp(l)\]

9.3.4 Putting it all together

\[ Y_{i}| \beta_{0}, \beta_{1}, \sigma \overset{ind}{\sim} N(\mu_{i}, \sigma^2) \; with \quad \mu_{i} = \beta_{0} + \beta_{1}X_{i} \] \[ \beta_{0} \sim N(m_{0}, s^2_{0} ) \]

\[ \beta_{1} \sim N(m_{1}, s^2_{1} ) \]

\[\sigma \sim Exp(l)\]

Model building one step at a time!

  1. \(Y\) is discrete or continuous \(\rightarrow\) appropriate model for data

  2. Rewrite the mean of \(Y\) as a function of predictors \(X\) (e.g. \(\mu = \beta_0 + \beta_1 X\))

  3. Identify unknown parameters in your model

  4. Note the values these parameters might take \(\rightarrow\) Identify appropriate priors