Let's recall that Stochastic Gradient Descent (SGD) is an efficient optimization algorithm known for its robust functionalities. However, when dealing with large datasets, SGD encounters particular challenges that instigate instabilities in the loss function. To overcome these limitations, we'll discuss Mini-Batch Gradient Descent (MBGD) in this session - a technique that combines the best attributes of SGD and Batch Gradient Descent. By the end of today's lesson, you'll understand the theory behind MBGD and be ready to implement it using Python.
While SGD's power lies in its efficiency, especially when dealing with large datasets, it has limitations. The loss function can become unstable when the model's parameters are updated at each iteration. This instability is one of the primary challenges that MBGD aims to overcome.
MBGD offers a conceptual middle ground between SGD and Batch Gradient Descent. Like its predecessors, MBGD divides the dataset into small subsets or mini-batches. It then computes the gradient of the cost function concerning this subset and accordingly updates the model's parameters.
A distinguishing feature of MBGD is its capacity to tune the size of the mini-batches. MBGD behaves as Batch Gradient Descent if the batch size equates to the dataset size. If the batch size is 1, it acts like SGD. However, a mini-batch size between 10 and 1000 is typically selected in practice.
Now, we'll delve into Python to implement MBGD. For this, we'll use numpy
for numerical computations. The gradient_descent
function carries out the Mini-Batch Gradient Descent:
