Method: Nearest neighbours
There are two important questions:
- How many neighbours should we choose?
- Less neighbours \(=\) more local is the analysis.
- More neighbours \(=\) less variability of the results.
- Having about 20 neighbours could work fine (depending on dataset size).
- What metric should be used to measure the “proximity” of observations?
- If the data is numeric we could use:
- Euclidean distance
- Manhattan distance
- Minkowski distance
- Chebyshev distance
- Cosine similarity
- As we have more predictors then the results will change from metric to metric.
- If the data is numeric we could use: