K-Means Clustering
Divide data into pre-defined K number of clusters, by minimizing the sum of squared distances of each record to their cluster means.
Steps:
- Start by randomly selecting K points, each of them being the initial centroid.
- Assign the remaining data, to the closest cluster, and update the cluster centroid (mean).
- Repeat this process until the cluster means don’t move.
Algorithm is not guaranteed to find the best possible solution, and there’s no standard for finding the optimal K.
Interpreting the clusters mainly involve the sizes of the clusters and the means. We’re trying to see if the clusters will work on new data.