Key idea is of PCA is eigendecomposition of the covariance matrix

  • Find eignenvectors and eigenvalues.
  • Sort eigenvalues from largest to smallest.
  • Truncate at some number of eigenvalues \(p < N\) (and their corresponding eigenvalues)
  • Project the predictor vectors (\(\mathbf{X}\)) onto this smaller basis

\[ \mathbf{\alpha^{(n)}} = \mathbf{U_p^Tx^{(n)}} \]

where \(\mathbf{U_p}\) is the Nxp matrix of eigenvectors in the smaller basis.

This allows a compression of the data into a smaller set of predictors.

Note there are some limitations of PCA:

  • PCA fails when raw data is not orthogonal

  • Basis vectors returned by PCA are not interpretable

  • PCA does not return the most influential component (it doesn’t depend at all on the response variable.)