## Notes {.unnumbered}```{python}#| echo: falseimport numpy as np, pandas as pdfrom matplotlib.pyplot import subplotsimport statsmodels.api as smfrom ISLP import load_datafrom ISLP.models import (summarize, poly, ModelSpec as MS)from statsmodels.stats.anova import anova_lmfrom pygam import (s as s_gam, l as l_gam, f as f_gam, LinearGAM, LogisticGAM)from ISLP.transforms import (BSpline, NaturalSpline)from ISLP.models import bs, nsfrom ISLP.pygam import (approx_lam, degrees_of_freedom, plot as plot_gam, anova as anova_gam)Wage = load_data('Wage')y = Wage['wage']age = Wage['age']```### Summary of Chapter 7 Models- Polynomials $\clubsuit$ $\spadesuit$ $\heartsuit$- Step functions $\clubsuit$ $\spadesuit$ $\heartsuit$- Splines - Regression splines $\clubsuit$ $\spadesuit$ $\heartsuit$ - Natural splines $\clubsuit$ $\spadesuit$ $\heartsuit$ - Smoothing splines $\clubsuit$ $\heartsuit$- Local regression$\clubsuit$ = Element of Generalized Additive Model \$\spadesuit$ = Can be represented with basis function \$\heartsuit$ = Efficient LOOCV### Key Python Packages- [`pygam`] (https://pygam.readthedocs.io/en/latest/) - A comprehensive framework for specifying GAMs- `scipy.interpolate` - `BSpline()` - class for basis splines (but different workflow from scikit-learn) - `CubicSpline()` - class for cubic splines in scipy workflow (but different workflow from scikit-learn)### Polynomials and Step Functions### Basis functions- Basis functions let us represent non-linear relationships in the standard linear modeling framework.$$y_i = \beta_0 + \beta_1b_1(x_i) + $\beta_2b_2(x_i) + \beta_3b_3(x_i) + ... \beta_k b_k(x_i) + \epsilon_i$$where $b_1$...$b_k$ are known functions.- Matrix representation: $$\underbrace{y}_{n \times 1} = \underbrace{\mathbf{A}}_{n \times k}\underbrace{\beta}_{k \times 1} + \underbrace{\epsilon}_{n \times 1}$$$$ \begin{bmatrix} y_1 \\y_2 \\... \\y_n\end{bmatrix} = \begin{bmatrix}b_0(x_1) & b_1(x_1) & b_k(x_1) \\b_0(x_2) & b_1(x_2) & b_k(x_2) \\ \dots & \ddots & \vdots \\b_0(x_n) & b_0(x_n) & b_k(x_n)\end{bmatrix} \cdot \begin{bmatrix}\beta_0 \\\beta_1 \\\vdots \\\beta_k \end{bmatrix} +\begin{bmatrix}\epsilon_1 \\\epsilon_2 \\\vdots \\\epsilon_n\end{bmatrix}$$- The formula for $\hat{\beta}$ is even the same.$$ \hat{\beta} = (\mathbf{A}^{\intercal} \mathbf{A}) \mathbf{A} y$$### Splines- Splines are another class of nonlinear, wiggly functions- Splines can achieve similar flexibility to polynomial regression with much lower degree and hence more stability- They achieve this by fitting low-degree polynomial functions over pre-set ranges of the independent variable#### Regression Splines- The regression spline is a set of low-degree polynomial functions connected by *knots* and *constraints*- *knots* define the range of each constituent polynomial function- *constraints* define the relationships between the functions - i.e. continuity, smoothness, linearity at the boundary - The relationship between the number of predictors, knots and degrees of freedom is somewhat arcane- The truncated power basis function allows us to formulate the cubic spline as a basis function$$h(\mathcal{x}, \xi) = (x - \xi)^3_+ = \begin{cases} (\mathcal(x)-\xi)^3 & \text{if } \mathcal{x} > \xi \\ 0 & \text{otherwise} \end{cases}$$$y_i = \beta_0 + \beta_1 \cdot x_i + \beta_2 \cdot x_i^2 + \beta^3 \cdot x_i^3 +\beta_4 h(x_i, \xi_1) + \beta_5 h(x_i, \xi_2) + \beta_6\cdot h(x_i, \xi_3) + \epsilon_i$#### Natural Splines- Natural splines impose a boundary constraint on regression splines#### Smoothing Splines- Smoothing splines are regularized splines!- Like with Ridge and Lasso regression, we derive the function from a minimization criteria:$$\underset{g}{\text{arg min}}(\text{RSS} + \lambda \int g^{''}(t)^2dt) $$- Reminder about the Ridge minimization criteria:$$\underset{\beta_j}{\text{arg min}}(\text{RSS} + \lambda (\sum_{j=1}^p \beta_j^2)$$- It's the same! But this time the penalty applies to the second derivative of the polynomial function.- "Shrinkage" of coefficients in more complex regions of the model towards zero- Think of it as a "wiggliness" penalty- In general: - $\lambda \to 0$ we have low bias, high variance (i.e. overfitting) - $\lambda \to \inf$ we have high bias, low variance (i.e. linear regression) - $\lambda \to 0$ then $df_\lambda \to n$ - $\lambda \to \inf$ then $df_\lambda \to 2$- Solving for the degrees of freedom is a bit arcane- It is the sum of the diagonal elements of $\mathbf{S}_{\lambda_{ii}}$$$\hat{g}_\lambda = \mathbf{S}_\lambda y$$$$df_\lambda = \sum_{i=1}^n \mathbf{S}_{\lambda_{ii}}$$ ### Computation of Splines- `ISLP` wraps `scipy.interpolate` to fit into the "transform" workflow```{python}#| echo: truebs_age = MS([bs('age', internal_knots=[25,40,60], name='bs(age)')])Xbs = bs_age.fit_transform(Wage)M = sm.OLS(y, Xbs).fit()summarize(M)``````{python}#| echo: true#| eval: true bs_ = BSpline(internal_knots=[25,40,60], intercept=True).fit(age)bs_age = bs_.transform(age)bs_age.head()```- The `scipy.interpolate()` documentation is...a bit technical- It is left as an exercise to the reader to figure out how to create a model matrix without `ISLP`'s help. Try `splrep()````{python}#| echo: true#| eval: true import scipy.interpolate as interpolate x = np.array([ 0. , 1.2, 1.9, 3.2, 4. , 6.5])y = np.array([ 0. , 2.3, 3. , 4.3, 2.9, 3.1])# t = knots, c = coefficients, k = degreet, c, k = interpolate.splrep(x, y, s=0, k=4)spline = interpolate.BSpline(t, c, k, extrapolate =False)``````{python}#| echo: false#| eval: trueprint('''\t: {}c: {}k: {}'''.format(t, c, k)) ```## GAMs- GAMs extend basis functions to the multivariate case...and more!