12.5 The matrix decomposition
The singular value decomposition (SVD):
the svd()
function returns three components, u, d, and v.
<- svd(X)
sX names(sX)
## [1] "d" "u" "v"
round(sX$v, 3)
## [,1] [,2] [,3] [,4]
## [1,] -0.536 -0.418 0.341 0.649
## [2,] -0.583 -0.188 0.268 -0.743
## [3,] -0.278 0.873 0.378 0.134
## [4,] -0.543 0.167 -0.818 0.089
v is equivalent to the loadings, u is equivalent to the standardized scores, and d is the matrix of the standard deviations.
t(sX$d * t(sX$u)) %>% head
## [,1] [,2] [,3] [,4]
## [1,] -0.9756604 -1.1220012 0.43980366 0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429 0.7384595 -0.05423025 -0.826264240
## [4,] 0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128 1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407 0.9776297 -1.08400162 0.001450164
$x %>% head pcob
## PC1 PC2 PC3 PC4
## [1,] -0.9756604 -1.1220012 0.43980366 0.154696581
## [2,] -1.9305379 -1.0624269 -2.01950027 -0.434175454
## [3,] -1.7454429 0.7384595 -0.05423025 -0.826264240
## [4,] 0.1399989 -1.1085423 -0.11342217 -0.180973554
## [5,] -2.4986128 1.5274267 -0.59254100 -0.338559240
## [6,] -1.4993407 0.9776297 -1.08400162 0.001450164
12.5.1 Matrix Completion
Sometimes you want to fill in NAs intelligently.
Technique
- Start with mean imputation per column.
- Use the computed PCA data to impute values.
- Recompute PCA and repeat.
- Technically they use
svd()
(singular-value decomposition) in the lab, which is called inside theprcomp()
function, to more directly demonstrate what’s happening.
Set up
- First we set up a matrix with missing values.
- The code for this is in the book and not particularly interesting, but I’ve made the names suck less.
- I also don’t scale, because their package does this internally.
<- data.matrix(USArrests)
arrests
<- 20
n_omit set.seed(15)
<- sample(seq(50), n_omit)
target_rows <- sample(1:4, n_omit, replace = TRUE)
target_cols <- cbind(target_rows, target_cols)
targets head(targets, 2)
## target_rows target_cols
## [1,] 37 3
## [2,] 47 1
<- arrests
arrests_na <- NA
arrests_na[targets] head(arrests_na, 2)
## state Murder Assault UrbanPop Rape
## [1,] 1 NA 236 58 21.2
## [2,] 2 10 NA 48 44.5
<- is.na(arrests_na) is_missing
The
{softImpute}
package to do this, let’s use it!
<- softImpute::softImpute(
fit_svd
arrests_na, type = "svd",
thresh = 1e-16,
maxit = 3000
)<- softImpute::complete(arrests_na, fit_svd, unscale = TRUE)
arrests_imputed cor(arrests_imputed[is_missing], arrests[is_missing])
## [1] 0.7249977