Accelerating Convergence: Implementing Momentum in Gradient Descent Algorithms

Getting Started with Momentum

Hello! Today, we will learn about a powerful technique that makes our Gradient Descent move faster, like a ball rolling down a hill. We call this "Momentum".

What's Momentum and How It Works

Momentum improves our Gradient Descent. How does it do that? Remember how a ball on top of a hill starts rolling down? If the slope is steep, the ball picks up speed, right? That's what momentum does to our Gradient Descent. It makes it move faster when the slope (our 'hill') points in the same direction over time.

How to Add Momentum to Gradient Descent

Let's get down to coding! Here's a little piece of code to demonstrate the effect of momentum in a gradient descent process. We will use a gradient function, grad_func(). The weight or parameter (theta) starts at a point and moves down the slope by adjusting itself in every iteration or 'epoch':

$v := v \cdot \gamma + \alpha \cdot gradient$

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal