9.12 More than Two Classes

  • The concept of separating hyperplanes does not extend naturally to more than two classes, but there are some ways around this.
  • A one-versus-one approach constructs K \choose 2 SVMs, where K is the number of classes. An observation is classified to each of the K \choose 2 classes, and the number of times it appears in each class is counted.
  • The k^\text{th} class might be coded as +1 versus the (k')^\text{th} class is coded as -1.
  • The data point is classified to the class for which it was most often assigned in the pairwise classifications.
  • Another option is one-versus-all classification. This can be useful when there are a lot of classes.
  • K SVMs are fitted, and one of the K classes to the remaining K-1 classes.
  • \beta_{0k}...\beta_{pk} denotes the parameters that results from constructing an SVM comparing the kth class (coded as +1) to the other classes (-1).
  • Assign test observation x^* to the class k for which \beta_{0k} + ... + \beta_{pk}x^*_{p} is largest.