6.2.2 - Lasso penalty
The lasso (least absolute shrinkage and selection operator) penalty is an alternative to the ridge penalty that requires only a small modification. The objective function is:
\[\begin{equation} \operatorname{minimize}\left(S S E+\lambda \sum_{j=1}^p |\beta_j|\right) \end{equation}\]Whereas the ridge penalty pushes variables to approximately but not equal to zero, the lasso penalty will actually push coefficients all the way to zero as illustrated in Figure 6.3.
In the figure above we see that when λ<0.01 all 15 variables are included in the model, when λ≈0.5 9 variables are retained, and when log(λ)=1 only 5 variables are retained.