8.5 Linear Regression

  • Linear regression allows us to explore the relationship between a quantitative response variable and an explanatory variable while other variables are held constant.

  • Below is a model to predict home prices, the response variable, by the explanatory variables, lot size (\(ft^2\)), age (yrs), land value (\(\$1000s\)), living area (\(ft^2\)), number of bedrooms and bathrooms and whether the home is on the waterfront or not.

data(SaratogaHouses, package="mosaicData")
houses_lm <- lm(price ~ lotSize + age + landValue +
                  livingArea + bedrooms + bathrooms +
                  waterfront, 
                data = SaratogaHouses)

summary(houses_lm)$coefficients
##                   Estimate   Std. Error   t value     Pr(>|t|)
## (Intercept)   1.398788e+05 1.647293e+04  8.491436 4.345065e-17
## lotSize       7.500792e+03 2.075136e+03  3.614604 3.094673e-04
## age          -1.360401e+02 5.415794e+01 -2.511914 1.209876e-02
## landValue     9.093072e-01 4.583046e-02 19.840672 4.716289e-79
## livingArea    7.517866e+01 4.158113e+00 18.079993 4.954903e-67
## bedrooms     -5.766760e+03 2.388433e+03 -2.414454 1.586262e-02
## bathrooms     2.454711e+04 3.332268e+03  7.366487 2.705486e-13
## waterfrontNo -1.207266e+05 1.560083e+04 -7.738475 1.703303e-14
  • We estimate that an increase of one square foot of living area is associated with a home price increase of \(\$75\), holding the other variables constant.

  • We estimate that a waterfront home costs approximately \(\$120,726\) more than non-waterfront home, again controlling for the other variables in the model.