1.3 Terminology
- Unsupervised models are used to understand relationships between variables or sets of variables without an explicit relationship between variables and an outcome.
- Examples: PCA, clustering, autoencoders.
- Supervised models have an outcome variable.
- Examples: linear regression, decision trees, neural networks.
- Regression: numeric outcome
- Classification: ordered or unordered qualitative values.
- Quantitative data: numbers.
- Qualitative (nominal) data: non-numbers.
- Qualitative data still might be coded as numbers, e.g. one-hot encoding or dummy variable encoding
- Data can have different roles in analyses:
- Outcomes (labels, endpoints, dependent variables): the value being predicted in supervised models.
- Predictors (independent variables): the variables used to predict the outcome.
- Identifiers
Choosing a model type will depend on the type of question we want to answer / problem to solve and on the available data, among other things.