PCA

Key idea is of PCA is eigendecomposition of the covariance matrix

  • Find eignenvectors and eigenvalues.
  • Sort eigenvalues from largest to smallest.
  • Truncate at some number of eigenvalues p<N (and their corresponding eigenvalues)
  • Project the predictor vectors (X) onto this smaller basis

α(n)=UTpx(n)

where Up is the Nxp matrix of eigenvectors in the smaller basis.

This allows a compression of the data into a smaller set of predictors.

Note there are some limitations of PCA:

  • PCA fails when raw data is not orthogonal

  • Basis vectors returned by PCA are not interpretable

  • PCA does not return the most influential component (it doesn’t depend at all on the response variable.)