Welcome to k-means clustering! Remember unsupervised learning from our first unit? k-means is a perfect example - it finds hidden groups in data without any labels.
Instead of predicting outcomes, k-means discovers natural clusters by grouping similar data points together.
Engagement Message
Can you share one real-world example where you'd want to segment customers without any labels.
k-means works by finding centroids—the center points of each cluster. Think of centroids as the "average location" of all the points in a group.
The algorithm starts with random centroids, then shifts them to better reflect the positions of their assigned points.
Engagement Message
Does this make sense?
Let's look at a simple example to see how k-means works in practice. Here's our simple dataset of 6 points on a 2D plane:
Notice how some points are close together? k-means will discover these natural groups.
Engagement Message
Does this make sense?
Let's start with k=2 (two clusters). We randomly place our initial centroids at:
- Centroid 1: (3, 3)
- Centroid 2: (6, 6)
Now we assign each point to its nearest centroid using Euclidean distance.
Engagement Message
Can you type the letters of the points you think will be closer to Centroid 1?
