6.1 KNN
The conceptually simplest way to reason about new data is to assume that examples that have similar features will also have similar targets.
- KNN applies this assumption directly by comparing new data with the known examples that have the most closely matching features to the new data (the “nearest neighbors”).
- How many neighbors?
- Too few: overfit.
- Too many: underfit (oversmooth).
- How do we measure “closeness”?
- Pick a distance metric.
- (Usually) normalize the variables.