7.4 Conceptual Exercises
- … cubic regression spline with a knot at x=ξ with basis functions
x,x2,x3,(x−ξ)3+ where (x−ξ)3+={(x−ξ)3x>ξ0otherwise
We will show that a function of the form
f(x)=β0+β1x+β2x2+β3x3+β4(x−ξ)3+
is indeed a cubic regression spline.
- Find a cubic polynomial
f1(x)=a1+b1x+c1x2+d1x3
such that f1(x)=f(x) for all x≤ξ.
Answer
a1=β0b1=β1c1=β2d1=β3
- Find a cubic polynomial
f2(x)=a2+b2x+c2x2+d2x3
such that f2(x)=f(x) for all x>ξ.
Answer
a2=β0−β4ξ3b2=β1+3β4ξ2c2=β2−3β4ξd2=β3+β4
We have now shown that f(x) is a piecewise polynomial.
- Show that f(x) is continuous at ξ
Answer
f1(ξ)=f2(ξ)β0+β1ξ+β2ξ2+β3ξ3=β0+β1ξ+β2ξ2+β3ξ3
- Show that f′(x) is continuous at ξ
Answer
f′1(ξ)=f′2(ξ)β1+2β2ξ+3β3ξ2=β1+2β2ξ+3β3ξ2
- Show that f″ is continuous at \xi
Answer
\begin{array}{rcl} f_{1}''(\xi) & = & f_{2}''(\xi) \\ 2\beta_{2} + 6\beta_{3}\xi & = & 2\beta_{2} + 6\beta_{3}\xi \\ \end{array}
- Suppose that a curve \hat{g} is computed to smoothly fit a set of n points using the following formula:
\hat{g} = \mathop{\mathrm{arg\,min}}_{g} \left( \displaystyle\sum_{i=1}^{n} (y_{i}-g(x_{i}))^{2} + \lambda\displaystyle\int \left[g^{(m)}(x)\right]^{2} \, dx \right)
where g^{(m)} is the m^{\text{th}} derivative of g (and g^{(0)} = g). Describe \hat{g} in each of the following situations.
- \lambda = \infty, \quad m = 0
Answer
heavy penalization of all functions except constants (i.e. horizontal lines)- \lambda = \infty, \quad m = 1
Answer
heavy penalization of all functions except linear functions—i.e. \hat{g} = a + bx- \lambda = \infty, \quad m = 2
Answer
heavy penalization of all functions except degree-2 polynomials \hat{g} = a + bx + cx^{2}- \lambda = \infty, \quad m = 3
Answer
heavy penalization of all functions except degree-3 polynomials \hat{g} = a + bx + cx^{2} + dx^{3}- \lambda = 0, \quad m = 2
Answer
No penalization implies perfect fit of training data.Answer
f(x) = \begin{cases} 1 + x, & -2 \leq x \leq 1 \\ 1 + x - 2(x-1)^{2}, & 1 \leq x \leq 2 \\ \end{cases}

Answer
f(x) = \begin{cases} 0, & -2 \leq x < 0 \\ 1, & 0 \leq x \leq 1 \\ x, & 1 \leq x \leq 2 \\ 0, & 2 < x < 3 \\ 3x-3, & 3 \leq x \leq 4 \\ 1, & 4 < x \leq 5 \\ \end{cases}- Consider two curves \hat{g}_{1} and \hat{g}_{2}
\hat{g}_{1} = \mathop{\mathrm{arg\,min}}_{g} \left( \displaystyle\sum_{i=1}^{n} (y_{i}-g(x_{i}))^{2} + \lambda\displaystyle\int \left[g^{(3)}(x)\right]^{2} \, dx \right) \hat{g}_{2} = \mathop{\mathrm{arg\,min}}_{g} \left( \displaystyle\sum_{i=1}^{n} (y_{i}-g(x_{i}))^{2} + \lambda\displaystyle\int \left[g^{(4)}(x)\right]^{2} \, dx \right) where g^{(m)} is the m^{\text{th}} derivative of g
- As \lambda \rightarrow\infty, will \hat{g}_{1} or \hat{g}_{2} have the smaller training RSS?
Answer
\hat{g}_{2} is more flexible due to the higher order of the penalty term than \hat{g}_{1}, so \hat{g}_{2} will likely have a lower training RSS.- As \lambda \rightarrow\infty, will \hat{g}_{1} or \hat{g}_{2} have the smaller testing RSS?
Answer
Generally, \hat{g}_{1} will perform better on less flexible functions, and \hat{g}_{2} will perform better on more flexible functions.- For \lambda = 0, will \hat{g}_{1} or \hat{g}_{2} have the smaller training RSS?