Notes

Summary of Chapter 7 Models

  • Polynomials \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
  • Step functions \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
  • Splines
    • Regression splines \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
      • Natural splines \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
    • Smoothing splines \(\clubsuit\) \(\heartsuit\)
  • Local regression

\(\clubsuit\) = Element of Generalized Additive Model
\(\spadesuit\) = Can be represented with basis function
\(\heartsuit\) = Efficient LOOCV

Key Python Packages

  • [pygam] (https://pygam.readthedocs.io/en/latest/)
    • A comprehensive framework for specifying GAMs
  • scipy.interpolate
    • BSpline() - class for basis splines (but different workflow from scikit-learn)
    • CubicSpline() - class for cubic splines in scipy workflow (but different workflow from scikit-learn)

Polynomials and Step Functions

Basis functions

  • Basis functions let us represent non-linear relationships in the standard linear modeling framework.

\[y_i = \beta_0 + \beta_1b_1(x_i) + $\beta_2b_2(x_i) + \beta_3b_3(x_i) + ... \beta_k b_k(x_i) + \epsilon_i\]

where \(b_1\)\(b_k\) are known functions.

  • Matrix representation: \[\underbrace{y}_{n \times 1} = \underbrace{\mathbf{A}}_{n \times k} \underbrace{\beta}_{k \times 1} + \underbrace{\epsilon}_{n \times 1}\]

\[ \begin{bmatrix} y_1 \\ y_2 \\ ... \\ y_n \end{bmatrix} = \begin{bmatrix} b_0(x_1) & b_1(x_1) & b_k(x_1) \\ b_0(x_2) & b_1(x_2) & b_k(x_2) \\ \dots & \ddots & \vdots \\ b_0(x_n) & b_0(x_n) & b_k(x_n) \end{bmatrix} \cdot \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \]

  • The formula for \(\hat{\beta}\) is even the same. \[ \hat{\beta} = (\mathbf{A}^{\intercal} \mathbf{A}) \mathbf{A} y\]

Splines

  • Splines are another class of nonlinear, wiggly functions
  • Splines can achieve similar flexibility to polynomial regression with much lower degree and hence more stability
  • They achieve this by fitting low-degree polynomial functions over pre-set ranges of the independent variable

Regression Splines

  • The regression spline is a set of low-degree polynomial functions connected by knots and constraints
  • knots define the range of each constituent polynomial function
  • constraints define the relationships between the functions
    • i.e. continuity, smoothness, linearity at the boundary
  • The relationship between the number of predictors, knots and degrees of freedom is somewhat arcane
  • The truncated power basis function allows us to formulate the cubic spline as a basis function \[h(\mathcal{x}, \xi) = (x - \xi)^3_+ = \begin{cases} (\mathcal(x)-\xi)^3 & \text{if } \mathcal{x} > \xi \\ 0 & \text{otherwise} \end{cases} \]

\(y_i = \beta_0 + \beta_1 \cdot x_i + \beta_2 \cdot x_i^2 + \beta^3 \cdot x_i^3 + \beta_4 h(x_i, \xi_1) + \beta_5 h(x_i, \xi_2) + \beta_6\cdot h(x_i, \xi_3) + \epsilon_i\)

Natural Splines

  • Natural splines impose a boundary constraint on regression splines

Smoothing Splines

  • Smoothing splines are regularized splines!
  • Like with Ridge and Lasso regression, we derive the function from a minimization criteria:

\[\underset{g}{\text{arg min}}(\text{RSS} + \lambda \int g^{''}(t)^2dt) \]

  • Reminder about the Ridge minimization criteria: \[\underset{\beta_j}{\text{arg min}}(\text{RSS} + \lambda (\sum_{j=1}^p \beta_j^2)\]

  • It’s the same! But this time the penalty applies to the second derivative of the polynomial function.

  • “Shrinkage” of coefficients in more complex regions of the model towards zero

  • Think of it as a “wiggliness” penalty

  • In general:

    • \(\lambda \to 0\) we have low bias, high variance (i.e. overfitting)
    • \(\lambda \to \inf\) we have high bias, low variance (i.e. linear regression)
    • \(\lambda \to 0\) then \(df_\lambda \to n\)
    • \(\lambda \to \inf\) then \(df_\lambda \to 2\)
  • Solving for the degrees of freedom is a bit arcane

  • It is the sum of the diagonal elements of \(\mathbf{S}_{\lambda_{ii}}\) \[\hat{g}_\lambda = \mathbf{S}_\lambda y\]

\[df_\lambda = \sum_{i=1}^n \mathbf{S}_{\lambda_{ii}}\]

Computation of Splines

  • ISLP wraps scipy.interpolate to fit into the “transform” workflow
bs_age = MS([bs('age',
                internal_knots=[25,40,60],
                name='bs(age)')])
Xbs = bs_age.fit_transform(Wage)
M = sm.OLS(y, Xbs).fit()
summarize(M)
coef std err t P>|t|
intercept 60.4937 9.460 6.394 0.000
bs(age)[0] 3.9805 12.538 0.317 0.751
bs(age)[1] 44.6310 9.626 4.636 0.000
bs(age)[2] 62.8388 10.755 5.843 0.000
bs(age)[3] 55.9908 10.706 5.230 0.000
bs(age)[4] 50.6881 14.402 3.520 0.000
bs(age)[5] 16.6061 19.126 0.868 0.385
bs_ = BSpline(internal_knots=[25,40,60], intercept=True).fit(age)
bs_age = bs_.transform(age)
bs_age.head()
BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[0] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[1] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[2] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[3] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[4] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[5] BSpline(intercept=True, internal_knots=[25, 40, 60], lower_bound=18.0,\n upper_bound=80.0)[6]
0 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
1 0.002915 0.559911 0.403778 0.033395 0.000000 0.000000 0.0
2 0.000000 0.000000 0.114796 0.618564 0.262733 0.003906 0.0
3 0.000000 0.000000 0.167109 0.633167 0.198880 0.000844 0.0
4 0.000000 0.000000 0.034014 0.508194 0.426542 0.031250 0.0
  • The scipy.interpolate() documentation is…a bit technical
  • It is left as an exercise to the reader to figure out how to create a model matrix without ISLP’s help. Try splrep()
import scipy.interpolate as interpolate 
x = np.array([ 0. ,  1.2,  1.9,  3.2,  4. ,  6.5])
y = np.array([ 0. ,  2.3,  3. ,  4.3,  2.9,  3.1])
# t = knots, c = coefficients, k = degree
t, c, k = interpolate.splrep(x, y, s=0, k=4)
spline = interpolate.BSpline(t, c, k, extrapolate = False)
t: [0.   0.   0.   0.   0.   2.55 6.5  6.5  6.5  6.5  6.5 ]
c: [-5.62048630e-18  2.98780300e+00 -5.74472095e-01  1.46700914e+01
 -1.03253068e+01  3.10000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00]
k: 4

GAMs

  • GAMs extend basis functions to the multivariate case…and more!