4.1 Weighted Regression
Used to give certain records (variables, features) more or less weighting when fitting the regression model. To show this, I will use the ames housing data from the {modeldata}
package from tidymodels
and prioritize sale prices of houses sold more recently than those sold earlier in these data.
dat <- ames %>%
dplyr::select(Lot_Area, Neighborhood, Year_Sold, First_Flr_SF, Second_Flr_SF,
Bsmt_Full_Bath, Full_Bath, Half_Bath, Bsmt_Half_Bath, Sale_Price,
Bedroom_AbvGr, Central_Air, Bldg_Type) %>%
dplyr::mutate(weight = Year_Sold - 2006,
total_sf = First_Flr_SF + Second_Flr_SF,
bath = Bsmt_Full_Bath + Full_Bath + 0.5*Half_Bath + 0.5*Bsmt_Half_Bath)
house_lm <- lm(Sale_Price ~ total_sf + Lot_Area + bath + Bedroom_AbvGr + Central_Air,
data = dat)
house_wt <- lm(Sale_Price ~ total_sf + Lot_Area + bath + Bedroom_AbvGr + Central_Air,
data = dat, weight = weight)
round(cbind(house_lm = house_lm$coefficients,
house_wt = house_wt$coefficients), digits = 3)
## house_lm house_wt
## (Intercept) -4804.696 -3943.203
## total_sf 104.742 104.705
## Lot_Area 0.605 0.673
## bath 25411.278 24754.042
## Bedroom_AbvGr -25438.440 -27218.685
## Central_AirY 41924.976 45938.683