29.2 Linear models and non linear models
Linear models assume a relationship of the form:
y = a_1 * x1 + a_2 * x2 + ... + a_n * xn
and assume that the residuals, the distances between the observed and predicted values, are generally normal distributed or have a normal distribution.
Types of Linear models:
- Linear models -
stats::lm()
- Generalised linear models -
stats::glm()
- Generalised additive models -
mgcv::gam()
- Penalised linear models -
glmnet::glmnet()
- Robust linear models -
MASS::rlm()
- Trees -
rpart::rpart()
Non linear models are models with a non-linear trend.
There are some models that require predictors that have been centered and scaled:
- neural networks
- K-nearest neighbors
- support vector machines SVG
while others require a traditional response surface design model expansion (quadratic and two-way interactions).
29.2.1 Transformations
We can switch between linear and non-linear models with some transformations:
# weighted regression
data(sim3)
sim3_2<-sim3%>%mutate(x3=case_when(x2=="a"~1,
x2=="b"~2,
x2=="c"~3,
x2=="d"~4))
mod3w <- lm(y~x1, sim3_2, weights = x3)
p1<- ggplot(sim3_2, aes(x1, y)) +
geom_point()+
geom_smooth()+
labs(title="Linear model")
# polynomial transformation
mod3t <- lm(y~poly(x1,3),sim3_2,weights = x3)
p2 <- ggplot(sim3_2, aes(x1, y)) +
geom_point()+
geom_smooth(method="lm", se=TRUE, fill=NA,
formula=y ~ poly(x, 3, raw=TRUE),colour="red" )+
labs(title="Polynomial transf.")
# splines
library(splines)
mod3s <- lm(y ~ bs(x1,3),sim3_2,weights = x3)
p3 <- ggplot(sim3_2, aes(x1, y)) +
geom_point()+
geom_smooth(method="lm", se=TRUE, fill=NA,
formula=y~splines::bs(x, 3),colour="red" )+
labs(title="Spline transf.")
library(patchwork)
p1+p2+p3
When you fit a model, you apply the estimates coefficients of your observed data to the model f
\[y = a_1 + a_2x\]
\[y = 7 + 3x\]
So, the conversion of the linear model formula \(y\sim{x}\) is \(y = a_1 + a_2x\)
Behind the scenes, what happens is:
And we can see it with the function model_matrix()
: