9.8 Nonlinear Classification

Many decision boundaries are not linear.
We could fit an SVC to the data using $2p$ features (in the case of $p$ features and using a quadratic form).

$X_{1}, X_{1}^{2}, \quad X_{2}, X_{2}^{2}, \quad\cdots, \quad X_{p}, X_{p}^{2}$

$\text{max}_{\beta_{0},\beta_{11},\beta_{12},\dots,\beta_{p1},\beta_{p2} \epsilon_{1},\dots,\epsilon_{n}, M} \space M$ $\text{subject to } y_{i}\left(\beta_{0} + \sum_{j=1}^{p} \beta_{ji}x_{ji} + \sum_{j=1}^{p} \beta_{ji}x_{ji}^{2}\right) \geq M(1 - \epsilon_{i})$

$\epsilon_{i} \geq 0, \quad \sum_{i=1}^{n}\epsilon_{i} \leq C, \quad \sum_{j=1}^{p}\sum_{k=1}^{2} \beta_{jk}^{2} = 1$

Note that in the enlarged feature space (here, with the quadratic terms), the decision boundary is linear. But in the original feature space, it is quadratic $q(x) = 0$ (in this example), and generally the solutions are not linear.
One could also include interaction terms, higher degree polynomials, etc., and thus the feature space could enlarge quickly and entail unmanageable computations.