Quick Recap: Unraveling K-means Clustering

Before moving on to the practical application, let's refresh our memory with a recap of K-means clustering. K-means clustering is an integral method in unsupervised learning. The main principle of K-means clustering is quite simple: it groups data points into distinct clusters based on their mutual distances to minimize the variance, also known as inertia, within each cluster.

We will now apply K-means clustering to a well-known dataset: the Iris dataset.

Diving into the Iris Dataset Again

The Iris dataset, as we've discussed in previous lessons, consists of measurements taken from 150 iris flowers across three distinct species. Imagine being a botanist searching for a systematic way to categorize new iris flowers based on these features. Doing so manually would be burdensome; hence, resorting to machine learning, specifically K-means clustering, becomes a logical choice!

Let's load this dataset using the sklearn library in Python and convert it into a pandas DataFrame:

Implementing K-means Clustering with sklearn

We're now going to implement K-means clustering on the Iris dataset. For this, we'll use the KMeans class from sklearn's cluster module. To keep our initial implementation straightforward, let's focus on just two dataset features: sepal length and .

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal