Method: Nearest neighbours

There are two important questions:

How many neighbours should we choose?
- Less neighbours $=$ more local is the analysis.
- More neighbours $=$ less variability of the results.
- Having about 20 neighbours could work fine (depending on dataset size).
What metric should be used to measure the “proximity” of observations?
- If the data is numeric we could use:
  - Euclidean distance
  - Manhattan distance
  - Minkowski distance
  - Chebyshev distance
  - Cosine similarity
- As we have more predictors then the results will change from metric to metric.