7.4 Conceptual Exercises

  1. … cubic regression spline with a knot at x=ξ with basis functions

x,x2,x3,(xξ)3+ where (xξ)3+={(xξ)3x>ξ0otherwise

We will show that a function of the form

f(x)=β0+β1x+β2x2+β3x3+β4(xξ)3+

is indeed a cubic regression spline.

  1. Find a cubic polynomial

f1(x)=a1+b1x+c1x2+d1x3

such that f1(x)=f(x) for all xξ.

Answer

a1=β0b1=β1c1=β2d1=β3

  1. Find a cubic polynomial

f2(x)=a2+b2x+c2x2+d2x3

such that f2(x)=f(x) for all x>ξ.

Answer

a2=β0β4ξ3b2=β1+3β4ξ2c2=β23β4ξd2=β3+β4

We have now shown that f(x) is a piecewise polynomial.

  1. Show that f(x) is continuous at ξ
Answer

f1(ξ)=f2(ξ)β0+β1ξ+β2ξ2+β3ξ3=β0+β1ξ+β2ξ2+β3ξ3

  1. Show that f(x) is continuous at ξ
Answer

f1(ξ)=f2(ξ)β1+2β2ξ+3β3ξ2=β1+2β2ξ+3β3ξ2

  1. Show that f is continuous at \xi
Answer

\begin{array}{rcl} f_{1}''(\xi) & = & f_{2}''(\xi) \\ 2\beta_{2} + 6\beta_{3}\xi & = & 2\beta_{2} + 6\beta_{3}\xi \\ \end{array}

  1. Suppose that a curve \hat{g} is computed to smoothly fit a set of n points using the following formula:

\hat{g} = \mathop{\mathrm{arg\,min}}_{g} \left( \displaystyle\sum_{i=1}^{n} (y_{i}-g(x_{i}))^{2} + \lambda\displaystyle\int \left[g^{(m)}(x)\right]^{2} \, dx \right)

where g^{(m)} is the m^{\text{th}} derivative of g (and g^{(0)} = g). Describe \hat{g} in each of the following situations.

  1. \lambda = \infty, \quad m = 0
Answer heavy penalization of all functions except constants (i.e. horizontal lines)
  1. \lambda = \infty, \quad m = 1
Answer heavy penalization of all functions except linear functions—i.e. \hat{g} = a + bx
  1. \lambda = \infty, \quad m = 2
Answer heavy penalization of all functions except degree-2 polynomials \hat{g} = a + bx + cx^{2}
  1. \lambda = \infty, \quad m = 3
Answer heavy penalization of all functions except degree-3 polynomials \hat{g} = a + bx + cx^{2} + dx^{3}
  1. \lambda = 0, \quad m = 2
Answer No penalization implies perfect fit of training data.

Answer f(x) = \begin{cases} 1 + x, & -2 \leq x \leq 1 \\ 1 + x - 2(x-1)^{2}, & 1 \leq x \leq 2 \\ \end{cases}
Answer f(x) = \begin{cases} 0, & -2 \leq x < 0 \\ 1, & 0 \leq x \leq 1 \\ x, & 1 \leq x \leq 2 \\ 0, & 2 < x < 3 \\ 3x-3, & 3 \leq x \leq 4 \\ 1, & 4 < x \leq 5 \\ \end{cases}
  1. Consider two curves \hat{g}_{1} and \hat{g}_{2}

\hat{g}_{1} = \mathop{\mathrm{arg\,min}}_{g} \left( \displaystyle\sum_{i=1}^{n} (y_{i}-g(x_{i}))^{2} + \lambda\displaystyle\int \left[g^{(3)}(x)\right]^{2} \, dx \right) \hat{g}_{2} = \mathop{\mathrm{arg\,min}}_{g} \left( \displaystyle\sum_{i=1}^{n} (y_{i}-g(x_{i}))^{2} + \lambda\displaystyle\int \left[g^{(4)}(x)\right]^{2} \, dx \right) where g^{(m)} is the m^{\text{th}} derivative of g

  1. As \lambda \rightarrow\infty, will \hat{g}_{1} or \hat{g}_{2} have the smaller training RSS?
Answer \hat{g}_{2} is more flexible due to the higher order of the penalty term than \hat{g}_{1}, so \hat{g}_{2} will likely have a lower training RSS.
  1. As \lambda \rightarrow\infty, will \hat{g}_{1} or \hat{g}_{2} have the smaller testing RSS?
Answer Generally, \hat{g}_{1} will perform better on less flexible functions, and \hat{g}_{2} will perform better on more flexible functions.
  1. For \lambda = 0, will \hat{g}_{1} or \hat{g}_{2} have the smaller training RSS?
Answer The penalty terms will be zero for both equations, so training and test terms will be equal.