12.1 Introduction
Unsupervised learning techniques are used to reduce the dimension of the dataset. Suppose we have a dataset composed of n observations and p features, so we have a matrix n x p, what we want is to select the best composition among our p features to achieve the highest value of data representation in a lower dimension environment.
A little recap of the difference between supervised and unsupervised learning:
- supervised is intended as guided by a response variable. In this case, our dimension reduction analysis is supported by an outcome variable which will serve as control-variable of the result of the analysis.
- on the contrary the unsupervised analysis hasn’t got a response variable to use as a checker, but the dimension reduction is carried on only on the features/predictors based on their variance level. This is the reason for which the unsupervised learning technique is a bit more challenging, we do not have something, such as a control-variable to rely on to verifying the results.
The main scope of the unsupervised principal component analysis is data visualization. To obtain a visualization of all the components of a data set in a lower dimension space, and have an idea of their relations.