Method: Nearest neighbours

There are two important questions:

  • How many neighbours should we choose?
    • Less neighbours \(=\) more local is the analysis.
    • More neighbours \(=\) less variability of the results.
    • Having about 20 neighbours could work fine (depending on dataset size).
  • What metric should be used to measure the “proximity” of observations?
    • If the data is numeric we could use:
      • Euclidean distance
      • Manhattan distance
      • Minkowski distance
      • Chebyshev distance
      • Cosine similarity
    • As we have more predictors then the results will change from metric to metric.