6.1 KNN

The conceptually simplest way to reason about new data is to assume that examples that have similar features will also have similar targets.

KNN applies this assumption directly by comparing new data with the known examples that have the most closely matching features to the new data (the “nearest neighbors”).
How many neighbors?
- Too few: overfit.
- Too many: underfit (oversmooth).
How do we measure “closeness”?
- Pick a distance metric.
- (Usually) normalize the variables.