Welcome back to our exploration of clustering algorithms! Today, we'll cover an improved version of the k-means algorithm — the mini-batch k-means. While related to k-means, this variant enhances computational speed and maintains exceptional clustering quality. Let's discuss its Python implementation.
In machine learning, mini-batches refer to subsets of data that are randomly selected for every algorithm iteration. This approach optimizes computational functions. Specifically for mini-batch k-means, this technique significantly accelerates the clustering process.
Before delving into the mini-batch k-means implementation, we must establish preparatory functions and a working dataset. Our dataset consists of two distinct clusters. We'll calculate the Euclidean distance and randomly initialize our centroids to assign each data point to its closest centroid.
We calculate the Euclidean distance using the formula: . This formula represents the straight-line distance between two points.
