12.5 The matrix decomposition
The singular value decomposition (SVD):
the svd() function returns three components, u, d, and v.
sX <- svd(X)
names(sX)## [1] "d" "u" "v"
round(sX$v, 3)## [,1] [,2] [,3] [,4]
## [1,] -0.536 -0.418 0.341 0.649
## [2,] -0.583 -0.188 0.268 -0.743
## [3,] -0.278 0.873 0.378 0.134
## [4,] -0.543 0.167 -0.818 0.089
v is equivalent to the loadings, u is equivalent to the standardized scores, and d is the matrix of the standard deviations.
t(sX$d * t(sX$u)) %>% head## [,1] [,2] [,3] [,4]
## [1,] -0.9756604 -1.1220012 0.43980366 0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429 0.7384595 -0.05423025 -0.826264240
## [4,] 0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128 1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407 0.9776297 -1.08400162 0.001450164
pcob$x %>% head## PC1 PC2 PC3 PC4
## [1,] -0.9756604 -1.1220012 0.43980366 0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429 0.7384595 -0.05423025 -0.826264240
## [4,] 0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128 1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407 0.9776297 -1.08400162 0.001450164
12.5.1 Matrix Completion
Sometimes you want to fill in NAs intelligently.
Technique
- Start with mean imputation per column.
- Use the computed PCA data to impute values.
- Recompute PCA and repeat.
- Technically they use
svd()(singular-value decomposition) in the lab, which is called inside theprcomp()function, to more directly demonstrate what’s happening.
Set up
- First we set up a matrix with missing values.
- The code for this is in the book and not particularly interesting, but I’ve made the names suck less.
- I also don’t scale, because their package does this internally.
arrests <- data.matrix(USArrests)
n_omit <- 20
set.seed(15)
target_rows <- sample(seq(50), n_omit)
target_cols <- sample(1:4, n_omit, replace = TRUE)
targets <- cbind(target_rows, target_cols)
head(targets, 2)## target_rows target_cols
## [1,] 37 3
## [2,] 47 1
arrests_na <- arrests
arrests_na[targets] <- NA
head(arrests_na, 2)## state Murder Assault UrbanPop Rape
## [1,] 1 NA 236 58 21.2
## [2,] 2 10 NA 48 44.5
is_missing <- is.na(arrests_na)The
{softImpute}package to do this, let’s use it!
fit_svd <- softImpute::softImpute(
arrests_na,
type = "svd",
thresh = 1e-16,
maxit = 3000
)
arrests_imputed <- softImpute::complete(arrests_na, fit_svd, unscale = TRUE)
cor(arrests_imputed[is_missing], arrests[is_missing])## [1] 0.7249977