12.5 The matrix decomposition

The singular value decomposition (SVD):

the svd() function returns three components, u, d, and v.

sX <- svd(X)
names(sX)

## [1] "d" "u" "v"

round(sX$v, 3)

##        [,1]   [,2]   [,3]   [,4]
## [1,] -0.536 -0.418  0.341  0.649
## [2,] -0.583 -0.188  0.268 -0.743
## [3,] -0.278  0.873  0.378  0.134
## [4,] -0.543  0.167 -0.818  0.089

v is equivalent to the loadings, u is equivalent to the standardized scores, and d is the matrix of the standard deviations.

t(sX$d * t(sX$u)) %>% head

##            [,1]       [,2]        [,3]         [,4]
## [1,] -0.9756604 -1.1220012  0.43980366  0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429  0.7384595 -0.05423025 -0.826264240
## [4,]  0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128  1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407  0.9776297 -1.08400162  0.001450164

pcob$x %>% head

##             PC1        PC2         PC3          PC4
## [1,] -0.9756604 -1.1220012  0.43980366  0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429  0.7384595 -0.05423025 -0.826264240
## [4,]  0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128  1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407  0.9776297 -1.08400162  0.001450164

12.5.1 Matrix Completion

Sometimes you want to fill in NAs intelligently.

Technique

Start with mean imputation per column.
Use the computed PCA data to impute values.
Recompute PCA and repeat.
Technically they use svd() (singular-value decomposition) in the lab, which is called inside the prcomp() function, to more directly demonstrate what’s happening.

Set up

First we set up a matrix with missing values.
The code for this is in the book and not particularly interesting, but I’ve made the names suck less.
I also don’t scale, because their package does this internally.

arrests <- data.matrix(USArrests)

n_omit <- 20
set.seed(15)
target_rows <- sample(seq(50), n_omit)
target_cols <- sample(1:4, n_omit, replace = TRUE)
targets <- cbind(target_rows, target_cols)
head(targets, 2)

##      target_rows target_cols
## [1,]          37           3
## [2,]          47           1

arrests_na <- arrests
arrests_na[targets] <- NA
head(arrests_na, 2)

##      state Murder Assault UrbanPop Rape
## [1,]     1     NA     236       58 21.2
## [2,]     2     10      NA       48 44.5

is_missing <- is.na(arrests_na)

The {softImpute} package to do this, let’s use it!

fit_svd <- softImpute::softImpute(
  arrests_na, 
  type = "svd",
  thresh = 1e-16,
  maxit = 3000
)
arrests_imputed <- softImpute::complete(arrests_na, fit_svd, unscale = TRUE)
cor(arrests_imputed[is_missing], arrests[is_missing])

## [1] 0.7249977