9.4 Mathematics of the MMC
- Consider constructing an MMC based on the training observations x1...xn∈Rp. This is the solution to the optimization problem:
maxβ0...βp,M M subject to p∑j=1β2j=1 yi(β0+β1Xi1+β2Xi2...+βpXip)≥M∀i=1...n
- M is the margin, and the β coeffients are chosen to maximize M.
- The constraint (3rd equation) ensures that each observation will be correctly classified, as long as M is positive.
- The 2nd and 3rd equations ensure that each data point is on the correct side of the hyperplane and at least M-distance away from the hyperplane.
- The perpendicular distance to the hyperplane is given by yi(β0+β1xi1+β2xi2...+βpxip).
But what if our data is not separable by a linear hyperplane?
Individual data points greatly affect formation of the maximal margin classifier