29.2 Linear models and non linear models

Linear models assume a relationship of the form:

  y = a_1 * x1 + a_2 * x2 + ... + a_n * xn

and assume that the residuals, the distances between the observed and predicted values, are generally normal distributed or have a normal distribution.

Types of Linear models:

  • Linear models - stats::lm()
  • Generalised linear models - stats::glm()
  • Generalised additive models - mgcv::gam()
  • Penalised linear models - glmnet::glmnet()
  • Robust linear models - MASS::rlm()
  • Trees - rpart::rpart()

Non linear models are models with a non-linear trend.

There are some models that require predictors that have been centered and scaled:

  • neural networks
  • K-nearest neighbors
  • support vector machines SVG

while others require a traditional response surface design model expansion (quadratic and two-way interactions).

29.2.1 Transformations

We can switch between linear and non-linear models with some transformations:

# weighted regression
data(sim3)
sim3_2<-sim3%>%mutate(x3=case_when(x2=="a"~1,
                                   x2=="b"~2,
                                   x2=="c"~3,
                                   x2=="d"~4))

mod3w <- lm(y~x1, sim3_2, weights = x3)
p1<- ggplot(sim3_2, aes(x1, y)) + 
  geom_point()+
  geom_smooth()+
  labs(title="Linear model")

# polynomial transformation
mod3t <- lm(y~poly(x1,3),sim3_2,weights = x3)
p2 <- ggplot(sim3_2, aes(x1, y)) + 
  geom_point()+
  geom_smooth(method="lm", se=TRUE, fill=NA,
                formula=y ~ poly(x, 3, raw=TRUE),colour="red" )+
  labs(title="Polynomial transf.")

# splines
library(splines)
mod3s <- lm(y ~ bs(x1,3),sim3_2,weights = x3)

p3 <- ggplot(sim3_2, aes(x1, y)) + 
  geom_point()+
  geom_smooth(method="lm", se=TRUE, fill=NA,
                formula=y~splines::bs(x, 3),colour="red" )+
  labs(title="Spline transf.")

library(patchwork)
p1+p2+p3

When you fit a model, you apply the estimates coefficients of your observed data to the model f

\[y = a_1 + a_2x\]

\[y = 7 + 3x\]

So, the conversion of the linear model formula \(y\sim{x}\) is \(y = a_1 + a_2x\)

Behind the scenes, what happens is:

model1 <- function(a, data) {
  a[1] + data$x * a[2] + a[3]
}

And we can see it with the function model_matrix():

model_matrix(sim3, y ~ x1)