Notes

Summary of Chapter 7 Models

  • Polynomials
  • Step functions
  • Splines
    • Regression splines
      • Natural splines
    • Smoothing splines
  • Local regression

= Element of Generalized Additive Model
= Can be represented with basis function
= Efficient LOOCV

Key Python Packages

  • [pygam] (https://pygam.readthedocs.io/en/latest/)
    • A comprehensive framework for specifying GAMs
  • scipy.interpolate
    • BSpline() - class for basis splines (but different workflow from scikit-learn)
    • CubicSpline() - class for cubic splines in scipy workflow (but different workflow from scikit-learn)

Polynomials and Step Functions

Basis functions

  • Basis functions let us represent non-linear relationships in the standard linear modeling framework.

yi=β0+β1b1(xi)+$β2b2(xi)+β3b3(xi)+...βkbk(xi)+ϵi

where b1bk are known functions.

  • Matrix representation: yn×1=An×kβk×1+ϵn×1

[y1y2...yn]=[b0(x1)b1(x1)bk(x1)b0(x2)b1(x2)bk(x2)b0(xn)b0(xn)bk(xn)][β0β1βk]+[ϵ1ϵ2ϵn]

  • The formula for β^ is even the same. β^=(AA)Ay

Splines

  • Splines are another class of nonlinear, wiggly functions
  • Splines can achieve similar flexibility to polynomial regression with much lower degree and hence more stability
  • They achieve this by fitting low-degree polynomial functions over pre-set ranges of the independent variable

Regression Splines

  • The regression spline is a set of low-degree polynomial functions connected by knots and constraints
  • knots define the range of each constituent polynomial function
  • constraints define the relationships between the functions
    • i.e. continuity, smoothness, linearity at the boundary
  • The relationship between the number of predictors, knots and degrees of freedom is somewhat arcane
  • The truncated power basis function allows us to formulate the cubic spline as a basis function h(x,ξ)=(xξ)+3={((x)ξ)3if x>ξ0otherwise

yi=β0+β1xi+β2xi2+β3xi3+β4h(xi,ξ1)+β5h(xi,ξ2)+β6h(xi,ξ3)+ϵi

Natural Splines

  • Natural splines impose a boundary constraint on regression splines

Smoothing Splines

  • Smoothing splines are regularized splines!
  • Like with Ridge and Lasso regression, we derive the function from a minimization criteria:

arg ming(RSS+λg(t)2dt)

  • Reminder about the Ridge minimization criteria: arg minβj(RSS+λ(j=1pβj2)

  • It’s the same! But this time the penalty applies to the second derivative of the polynomial function.

  • “Shrinkage” of coefficients in more complex regions of the model towards zero

  • Think of it as a “wiggliness” penalty

  • In general:

    • λ0 we have low bias, high variance (i.e. overfitting)
    • λinf we have high bias, low variance (i.e. linear regression)
    • λ0 then dfλn
    • λinf then dfλ2
  • Solving for the degrees of freedom is a bit arcane

  • It is the sum of the diagonal elements of Sλii g^λ=Sλy

dfλ=i=1nSλii

Computation of Splines

  • ISLP wraps scipy.interpolate to fit into the “transform” workflow
bs_age = MS([bs('age',
                internal_knots=[25,40,60],
                name='bs(age)')])
Xbs = bs_age.fit_transform(Wage)
M = sm.OLS(y, Xbs).fit()
summarize(M)
coef std err t P>|t|
intercept 60.4937 9.460 6.394 0.000
bs(age)[0] 3.9805 12.538 0.317 0.751
bs(age)[1] 44.6310 9.626 4.636 0.000
bs(age)[2] 62.8388 10.755 5.843 0.000
bs(age)[3] 55.9908 10.706 5.230 0.000
bs(age)[4] 50.6881 14.402 3.520 0.000
bs(age)[5] 16.6061 19.126 0.868 0.385
bs_ = BSpline(internal_knots=[25,40,60], intercept=True).fit(age)
bs_age = bs_.transform(age)
bs_age.head()
BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[0] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[1] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[2] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[3] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[4] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[5] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[6]
0 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
1 0.002915 0.559911 0.403778 0.033395 0.000000 0.000000 0.0
2 0.000000 0.000000 0.114796 0.618564 0.262733 0.003906 0.0
3 0.000000 0.000000 0.167109 0.633167 0.198880 0.000844 0.0
4 0.000000 0.000000 0.034014 0.508194 0.426542 0.031250 0.0
  • The scipy.interpolate() documentation is…a bit technical
  • It is left as an exercise to the reader to figure out how to create a model matrix without ISLP’s help. Try splrep()
import scipy.interpolate as interpolate 
x = np.array([ 0. ,  1.2,  1.9,  3.2,  4. ,  6.5])
y = np.array([ 0. ,  2.3,  3. ,  4.3,  2.9,  3.1])
# t = knots, c = coefficients, k = degree
t, c, k = interpolate.splrep(x, y, s=0, k=4)
spline = interpolate.BSpline(t, c, k, extrapolate = False)
t: [0.   0.   0.   0.   0.   2.55 6.5  6.5  6.5  6.5  6.5 ]
c: [-5.62048630e-18  2.98780300e+00 -5.74472095e-01  1.46700914e+01
 -1.03253068e+01  3.10000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00]
k: 4

GAMs

  • GAMs extend basis functions to the multivariate case…and more!