6.1 KNN

The conceptually simplest way to reason about new data is to assume that examples that have similar features will also have similar targets.

  • KNN applies this assumption directly by comparing new data with the known examples that have the most closely matching features to the new data (the “nearest neighbors”).
  • How many neighbors?
    • Too few: overfit.
    • Too many: underfit (oversmooth).
  • How do we measure “closeness”?
    • Pick a distance metric.
    • (Usually) normalize the variables.