Before moving on to the practical application, let's refresh our memory with a recap of K-means clustering. K-means clustering is an integral method in unsupervised learning. The main principle of K-means clustering is quite simple: it groups data points into distinct clusters based on their mutual distances to minimize the variance, also known as inertia
, within each cluster.
We will now apply K-means clustering to a well-known dataset: the Iris dataset.
The Iris dataset, as we've discussed in previous lessons, consists of measurements taken from 150 iris flowers across three distinct species. Imagine being a botanist searching for a systematic way to categorize new iris flowers based on these features. Doing so manually would be burdensome; hence, resorting to machine learning, specifically K-means clustering, becomes a logical choice!
Let's load this dataset using the sklearn
library in Python and convert it into a pandas DataFrame:
We're now going to implement K-means clustering on the Iris dataset. For this, we'll use the KMeans
class from sklearn's cluster
module. To keep our initial implementation straightforward, let's focus on just two dataset features: sepal length
and .
