9.3 Maximal Margin Classifier

  • Generally, if data can be perfectly separated using a hyperplane, an infinite amount of such hyperplanes exist.
  • An intuitive choice is the maximal margin hyperplane, which is the hyperplane that is farthest from the training data.
  • We compute the perpendicular distance from each training observation to the hyperplane. The smallest of these distances is known as the margin.
  • The maximal margin hyperplane is the hyperplane for which the margin is maximized. We can classify a test observation based on which side of the maximal margin hyperplane it lies on, and this is known as the maximal margin classifier.
  • The maximal margin classifier classifies x based on the sign of f(x)=β0+β1x1+...+βpxp.

  • Note the 3 training observations that lie on the margin and are equidistant from the hyperplane. These are the support vectors (vectors in p-dimensional space; in this case p=2).
  • They support the hyperplane because if their location was changed, the hyperplane would change.
  • The maximal margin hyperplane depends on these observations, but not the others (unless the other observations were moved at or within the margin).