9.3 Maximal Margin Classifier

Generally, if data can be perfectly separated using a hyperplane, an infinite amount of such hyperplanes exist.
An intuitive choice is the maximal margin hyperplane, which is the hyperplane that is farthest from the training data.
We compute the perpendicular distance from each training observation to the hyperplane. The smallest of these distances is known as the margin.
The maximal margin hyperplane is the hyperplane for which the margin is maximized. We can classify a test observation based on which side of the maximal margin hyperplane it lies on, and this is known as the maximal margin classifier.
The maximal margin classifier classifies $x^*$ based on the sign of $f(x^*) = \beta_{0} + \beta_{1}x^*_{1} + ... + \beta_{p}x^*_{p}$ .

Note the 3 training observations that lie on the margin and are equidistant from the hyperplane. These are the support vectors (vectors in $p$ -dimensional space; in this case $p=2$ ).
They support the hyperplane because if their location was changed, the hyperplane would change.
The maximal margin hyperplane depends on these observations, but not the others (unless the other observations were moved at or within the margin).