9.9 Support Vector Machines

  • The SVM is an extension of the SVC which results from using kernels to enlarge the feature space. A kernel is a function that quantifies the similarity of two data points.
  • Essentially, we want to enlarge the feature space to make use of a nonlinear decision boundary, while avoiding getting bogged down in unmanageable calculations.
  • The solution to the SVC problem in the SVM context involves only the inner products (AKA dot products) of the observations.

\[\langle x_{i} \; , x_{i'} \; \rangle = \sum_{j=1}^{p}x_{ij}x_{i'j}\]

In the context of SVM, the linear support vector classifier is as follows:

\[f(x) = \beta_{0} + \sum_{i=1}^{n}\alpha_{i}\langle \; x, x_i\; \rangle\]

  • To estimate the \(n\) \(\alpha_{i}\) coefficients and \(\beta_{0}\), we only need the \(\binom{n}{2}\) inner products between all pairs of training observations.
  • Note that in the equation above, in order to compute \(f(x)\) for the new point \(x\), we need the inner product between the new point and all the training observations. However, \(\alpha_{i} = 0\) for all points that are not on or within the margin (i.e., points that are not support vectors). So we can rewrite the equation as follows, where \(S\) is the set of support point indices:

\[f(x) = \beta_{0} + \sum_{i \in S}\alpha_{i}\langle \; x, x_{i} \; \rangle\]

  • Replace every inner product with \(K(x_{i}, x_{i'})\), where \(K\) is a kernel function.
  • \(K(x_{i}, x_{i'}) = \sum_{j=1}^{p}x_{ij}x_{i'j}\) is the SVC and is known as a linear kernel since it is linear in the features.
  • One could also have kernel functions of the following form, where \(d\) is a positive integer:

\[K(x_{i}, x_{i'}) = \left(1 + \sum_{j=1}^{p}x_{ij}x_{i'j}\right)^d\]

  • This will lead to a much more flexible decision boundary, and is basically fitting an SVC in a higher-dimensional space involving polynomials of degree \(d\), instead of the original feature space.
  • When an SVC is combined with a nonlinear kernel as above, the result is a support vector machine.

\[f(x) = \beta_{0} + \sum_{i \in S}\alpha_{i}K(x, x_{i})\]