Optimizing Machine Learning with Mini-Batch Gradient Descent

Introduction

Let's recall that Stochastic Gradient Descent (SGD) is an efficient optimization algorithm known for its robust functionalities. However, when dealing with large datasets, SGD encounters particular challenges that instigate instabilities in the loss function. To overcome these limitations, we'll discuss Mini-Batch Gradient Descent (MBGD) in this session - a technique that combines the best attributes of SGD and Batch Gradient Descent. By the end of today's lesson, you'll understand the theory behind MBGD and be ready to implement it using Python.

Understanding the drawbacks of SGD

While SGD's power lies in its efficiency, especially when dealing with large datasets, it has limitations. The loss function can become unstable when the model's parameters are updated at each iteration. This instability is one of the primary challenges that MBGD aims to overcome.

Introduction to Mini-Batch Gradient Descent

MBGD offers a conceptual middle ground between SGD and Batch Gradient Descent. Like its predecessors, MBGD divides the dataset into small subsets or mini-batches. It then computes the gradient of the cost function concerning this subset and accordingly updates the model's parameters.

A distinguishing feature of MBGD is its capacity to tune the size of the mini-batches. MBGD behaves as Batch Gradient Descent if the batch size equates to the dataset size. If the batch size is 1, it acts like SGD. However, a mini-batch size between 10 and 1000 is typically selected in practice.

Implementing Mini-Batch Gradient Descent in Python

Now, we'll delve into Python to implement MBGD. For this, we'll use numpy for numerical computations. The gradient_descent function carries out the Mini-Batch Gradient Descent:

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal