9.4 Mathematics of the MMC
- Consider constructing an MMC based on the training observations \(x_{1}...x_{n} \in \mathbb{R}^p\). This is the solution to the optimization problem:
\[\text{max}_{\beta_{0}...\beta_{p}, M} \space M\] \[\text{subject to } \sum_{j=1}^{p}\beta_{j}^2 = 1\] \[y_{i}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} ... + \beta_{p}X_{ip}) \geq M \quad \forall i = 1...n\]
- \(M\) is the margin, and the \(\beta\) coeffients are chosen to maximize \(M\).
- The constraint (3rd equation) ensures that each observation will be correctly classified, as long as M is positive.
- The 2nd and 3rd equations ensure that each data point is on the correct side of the hyperplane and at least M-distance away from the hyperplane.
- The perpendicular distance to the hyperplane is given by \(y_{i}(\beta_{0} + \beta_{1}x_{i1} + \beta_{2}x_{i2} ... + \beta_{p}x_{ip})\).
But what if our data is not separable by a linear hyperplane?
Individual data points greatly affect formation of the maximal margin classifier