coef | std err | t | P>|t| | |
---|---|---|---|---|
intercept | 60.4937 | 9.460 | 6.394 | 0.000 |
bs(age)[0] | 3.9805 | 12.538 | 0.317 | 0.751 |
bs(age)[1] | 44.6310 | 9.626 | 4.636 | 0.000 |
bs(age)[2] | 62.8388 | 10.755 | 5.843 | 0.000 |
bs(age)[3] | 55.9908 | 10.706 | 5.230 | 0.000 |
bs(age)[4] | 50.6881 | 14.402 | 3.520 | 0.000 |
bs(age)[5] | 16.6061 | 19.126 | 0.868 | 0.385 |
Notes
Summary of Chapter 7 Models
- Polynomials \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
- Step functions \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
- Splines
- Regression splines \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
- Natural splines \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
- Smoothing splines \(\clubsuit\) \(\heartsuit\)
- Regression splines \(\clubsuit\) \(\spadesuit\) \(\heartsuit\)
- Local regression
\(\clubsuit\) = Element of Generalized Additive Model
\(\spadesuit\) = Can be represented with basis function
\(\heartsuit\) = Efficient LOOCV
Key Python Packages
- [
pygam
] (https://pygam.readthedocs.io/en/latest/)- A comprehensive framework for specifying GAMs
scipy.interpolate
BSpline()
- class for basis splines (but different workflow from scikit-learn)CubicSpline()
- class for cubic splines in scipy workflow (but different workflow from scikit-learn)
Polynomials and Step Functions
Basis functions
- Basis functions let us represent non-linear relationships in the standard linear modeling framework.
\[y_i = \beta_0 + \beta_1b_1(x_i) + $\beta_2b_2(x_i) + \beta_3b_3(x_i) + ... \beta_k b_k(x_i) + \epsilon_i\]
where \(b_1\)…\(b_k\) are known functions.
- Matrix representation: \[\underbrace{y}_{n \times 1} = \underbrace{\mathbf{A}}_{n \times k} \underbrace{\beta}_{k \times 1} + \underbrace{\epsilon}_{n \times 1}\]
\[ \begin{bmatrix} y_1 \\ y_2 \\ ... \\ y_n \end{bmatrix} = \begin{bmatrix} b_0(x_1) & b_1(x_1) & b_k(x_1) \\ b_0(x_2) & b_1(x_2) & b_k(x_2) \\ \dots & \ddots & \vdots \\ b_0(x_n) & b_0(x_n) & b_k(x_n) \end{bmatrix} \cdot \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} \]
- The formula for \(\hat{\beta}\) is even the same. \[ \hat{\beta} = (\mathbf{A}^{\intercal} \mathbf{A}) \mathbf{A} y\]
Splines
- Splines are another class of nonlinear, wiggly functions
- Splines can achieve similar flexibility to polynomial regression with much lower degree and hence more stability
- They achieve this by fitting low-degree polynomial functions over pre-set ranges of the independent variable
Regression Splines
- The regression spline is a set of low-degree polynomial functions connected by knots and constraints
- knots define the range of each constituent polynomial function
- constraints define the relationships between the functions
- i.e. continuity, smoothness, linearity at the boundary
- i.e. continuity, smoothness, linearity at the boundary
- The relationship between the number of predictors, knots and degrees of freedom is somewhat arcane
- The truncated power basis function allows us to formulate the cubic spline as a basis function \[h(\mathcal{x}, \xi) = (x - \xi)^3_+ = \begin{cases} (\mathcal(x)-\xi)^3 & \text{if } \mathcal{x} > \xi \\ 0 & \text{otherwise} \end{cases} \]
\(y_i = \beta_0 + \beta_1 \cdot x_i + \beta_2 \cdot x_i^2 + \beta^3 \cdot x_i^3 + \beta_4 h(x_i, \xi_1) + \beta_5 h(x_i, \xi_2) + \beta_6\cdot h(x_i, \xi_3) + \epsilon_i\)
Natural Splines
- Natural splines impose a boundary constraint on regression splines
Smoothing Splines
- Smoothing splines are regularized splines!
- Like with Ridge and Lasso regression, we derive the function from a minimization criteria:
\[\underset{g}{\text{arg min}}(\text{RSS} + \lambda \int g^{''}(t)^2dt) \]
Reminder about the Ridge minimization criteria: \[\underset{\beta_j}{\text{arg min}}(\text{RSS} + \lambda (\sum_{j=1}^p \beta_j^2)\]
It’s the same! But this time the penalty applies to the second derivative of the polynomial function.
“Shrinkage” of coefficients in more complex regions of the model towards zero
Think of it as a “wiggliness” penalty
In general:
- \(\lambda \to 0\) we have low bias, high variance (i.e. overfitting)
- \(\lambda \to \inf\) we have high bias, low variance (i.e. linear regression)
- \(\lambda \to 0\) then \(df_\lambda \to n\)
- \(\lambda \to \inf\) then \(df_\lambda \to 2\)
Solving for the degrees of freedom is a bit arcane
It is the sum of the diagonal elements of \(\mathbf{S}_{\lambda_{ii}}\) \[\hat{g}_\lambda = \mathbf{S}_\lambda y\]
\[df_\lambda = \sum_{i=1}^n \mathbf{S}_{\lambda_{ii}}\]
Computation of Splines
ISLP
wrapsscipy.interpolate
to fit into the “transform” workflow
BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[0] | BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[1] | BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[2] | BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[3] | BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[4] | BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[5] | BSpline(intercept=True, internal_knots=[25, 40, 60],\n lower_bound=np.float64(18.0), upper_bound=np.float64(80.0))[6] | |
---|---|---|---|---|---|---|---|
0 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 |
1 | 0.002915 | 0.559911 | 0.403778 | 0.033395 | 0.000000 | 0.000000 | 0.0 |
2 | 0.000000 | 0.000000 | 0.114796 | 0.618564 | 0.262733 | 0.003906 | 0.0 |
3 | 0.000000 | 0.000000 | 0.167109 | 0.633167 | 0.198880 | 0.000844 | 0.0 |
4 | 0.000000 | 0.000000 | 0.034014 | 0.508194 | 0.426542 | 0.031250 | 0.0 |
- The
scipy.interpolate()
documentation is…a bit technical - It is left as an exercise to the reader to figure out how to create a model matrix without
ISLP
’s help. Trysplrep()
t: [0. 0. 0. 0. 0. 2.55 6.5 6.5 6.5 6.5 6.5 ]
c: [-5.62048630e-18 2.98780300e+00 -5.74472095e-01 1.46700914e+01
-1.03253068e+01 3.10000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00]
k: 4
GAMs
- GAMs extend basis functions to the multivariate case…and more!