9.4 Mathematics of the MMC

  • Consider constructing an MMC based on the training observations x1...xnRp. This is the solution to the optimization problem:

maxβ0...βp,M M subject to pj=1β2j=1 yi(β0+β1Xi1+β2Xi2...+βpXip)Mi=1...n

  • M is the margin, and the β coeffients are chosen to maximize M.
  • The constraint (3rd equation) ensures that each observation will be correctly classified, as long as M is positive.

  • The 2nd and 3rd equations ensure that each data point is on the correct side of the hyperplane and at least M-distance away from the hyperplane.
  • The perpendicular distance to the hyperplane is given by yi(β0+β1xi1+β2xi2...+βpxip).

But what if our data is not separable by a linear hyperplane?

Individual data points greatly affect formation of the maximal margin classifier