· Hakan Çelik · OpenCV / Machine Learning · 2 dk okuma
Understanding K-Means Clustering

Understanding K-Means Clustering
Goal
In this chapter, we will understand the concepts of K-Means Clustering, how it works etc.
Theory
Consider a company, which is going to release a new model of T-shirt to market. They will have to manufacture models in different sizes to satisfy people of all sizes. So the company makes a data of people’s height and weight, and plots them on to a graph:

Company can’t create t-shirts with all the sizes. Instead, they divide people to Small, Medium and Large, and manufacture only these 3 models. This grouping of people into three groups can be done by k-means clustering, and algorithm provides us best 3 sizes:

How does it work?
This algorithm is an iterative process:
Step 1: Algorithm randomly chooses two centroids, C1 and C2.
Step 2: It calculates the distance from each point to both centroids. If a test data is more closer to C1, then that data is labelled with ‘0’. If it is closer to C2, then labelled as ‘1’.
Step 3: Next we calculate the average of all blue points and red points separately and that will be our new centroids.
Steps 2 and 3 are iterated until both centroids are converged to fixed points. These points are such that sum of distances between test data and their corresponding centroids are minimum:
J = Σ distance(C1, Red_Point) + Σ distance(C2, Blue_Point) → minimize
Hakan Çelik


