Introduction

Welcome back to our exploration of clustering algorithms! Today, we'll cover an improved version of the k-means algorithm — the mini-batch k-means. While related to k-means, this variant enhances computational speed and maintains exceptional clustering quality. Let's discuss its Python implementation.

Understanding the Mini-Batch Concept

In machine learning, mini-batches refer to subsets of data that are randomly selected for every algorithm iteration. This approach optimizes computational functions. Specifically for mini-batch k-means, this technique significantly accelerates the clustering process.

Generative Dataset and Preliminaries

Before delving into the mini-batch k-means implementation, we must establish preparatory functions and a working dataset. Our dataset consists of two distinct clusters. We'll calculate the Euclidean distance and randomly initialize our centroids to assign each data point to its closest centroid.

We calculate the Euclidean distance using the formula: d(a,b)=(ab)2d(a, b) = \sqrt{\sum (a - b)^2}. This formula represents the straight-line distance between two points.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal