9.8 Nonlinear Classification

  • Many decision boundaries are not linear.
  • We could fit an SVC to the data using \(2p\) features (in the case of \(p\) features and using a quadratic form).

\[X_{1}, X_{1}^{2}, \quad X_{2}, X_{2}^{2}, \quad\cdots, \quad X_{p}, X_{p}^{2}\]

\[\text{max}_{\beta_{0},\beta_{11},\beta_{12},\dots,\beta_{p1},\beta_{p2} \epsilon_{1},\dots,\epsilon_{n}, M} \space M\] \[\text{subject to } y_{i}\left(\beta_{0} + \sum_{j=1}^{p} \beta_{ji}x_{ji} + \sum_{j=1}^{p} \beta_{ji}x_{ji}^{2}\right) \geq M(1 - \epsilon_{i})\]

\[\epsilon_{i} \geq 0, \quad \sum_{i=1}^{n}\epsilon_{i} \leq C, \quad \sum_{j=1}^{p}\sum_{k=1}^{2} \beta_{jk}^{2} = 1\]

  • Note that in the enlarged feature space (here, with the quadratic terms), the decision boundary is linear. But in the original feature space, it is quadratic \(q(x) = 0\) (in this example), and generally the solutions are not linear.
  • One could also include interaction terms, higher degree polynomials, etc., and thus the feature space could enlarge quickly and entail unmanageable computations.