12.5 The matrix decomposition
The singular value decomposition (SVD):
the svd() function returns three components, u, d, and v.
sX <- svd(X)
names(sX)## [1] "d" "u" "v"
round(sX$v, 3)##        [,1]   [,2]   [,3]   [,4]
## [1,] -0.536 -0.418  0.341  0.649
## [2,] -0.583 -0.188  0.268 -0.743
## [3,] -0.278  0.873  0.378  0.134
## [4,] -0.543  0.167 -0.818  0.089
v is equivalent to the loadings, u is equivalent to the standardized scores, and d is the matrix of the standard deviations.
t(sX$d * t(sX$u)) %>% head##            [,1]       [,2]        [,3]         [,4]
## [1,] -0.9756604 -1.1220012  0.43980366  0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429  0.7384595 -0.05423025 -0.826264240
## [4,]  0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128  1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407  0.9776297 -1.08400162  0.001450164
pcob$x %>% head##             PC1        PC2         PC3          PC4
## [1,] -0.9756604 -1.1220012  0.43980366  0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429  0.7384595 -0.05423025 -0.826264240
## [4,]  0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128  1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407  0.9776297 -1.08400162  0.001450164
12.5.1 Matrix Completion
Sometimes you want to fill in NAs intelligently.
Technique
- Start with mean imputation per column.
 - Use the computed PCA data to impute values.
 - Recompute PCA and repeat.
 - Technically they use 
svd()(singular-value decomposition) in the lab, which is called inside theprcomp()function, to more directly demonstrate what’s happening. 
Set up
- First we set up a matrix with missing values.
 - The code for this is in the book and not particularly interesting, but I’ve made the names suck less.
 - I also don’t scale, because their package does this internally.
 
arrests <- data.matrix(USArrests)
n_omit <- 20
set.seed(15)
target_rows <- sample(seq(50), n_omit)
target_cols <- sample(1:4, n_omit, replace = TRUE)
targets <- cbind(target_rows, target_cols)
head(targets, 2)##      target_rows target_cols
## [1,]          37           3
## [2,]          47           1
arrests_na <- arrests
arrests_na[targets] <- NA
head(arrests_na, 2)##      state Murder Assault UrbanPop Rape
## [1,]     1     NA     236       58 21.2
## [2,]     2     10      NA       48 44.5
is_missing <- is.na(arrests_na)The
{softImpute}package to do this, let’s use it!
fit_svd <- softImpute::softImpute(
  arrests_na, 
  type = "svd",
  thresh = 1e-16,
  maxit = 3000
)
arrests_imputed <- softImpute::complete(arrests_na, fit_svd, unscale = TRUE)
cor(arrests_imputed[is_missing], arrests[is_missing])## [1] 0.7249977