1.3 Terminology

  • Unsupervised models are used to understand relationships between variables or sets of variables without an explicit relationship between variables and an outcome.
    • Examples: PCA, clustering, autoencoders.
  • Supervised models have an outcome variable.
    • Examples: linear regression, decision trees, neural networks.
    • Regression: numeric outcome
    • Classification: ordered or unordered qualitative values.
  • Quantitative data: numbers.
  • Qualitative (nominal) data: non-numbers.
    • Qualitative data still might be coded as numbers, e.g. one-hot encoding or dummy variable encoding
  • Data can have different roles in analyses:
    • Outcomes (labels, endpoints, dependent variables): the value being predicted in supervised models.
    • Predictors (independent variables): the variables used to predict the outcome.
    • Identifiers

Choosing a model type will depend on the type of question we want to answer / problem to solve and on the available data, among other things.