Learning Rate Scheduling with PyTorch StepLR

Introduction: Why Learning Rate Scheduling Matters

Welcome to the first lesson of the Advanced Neural Tuning course. In this course, you will learn how to make your neural networks train more efficiently and achieve better results by using advanced optimization techniques. We will start with a key concept: learning rate scheduling.

The learning rate is a crucial parameter in training neural networks. It controls how much the model's weights are updated during each step of training. If the learning rate is too high, the model might not learn well and could even diverge. If it is too low, training can be very slow and might get stuck before reaching a good solution.

Learning rate scheduling is a technique in which you change the learning rate during training instead of keeping it constant. This can help your model learn faster at the beginning and fine-tune its weights as training progresses. In this lesson, you will learn how to use a popular learning rate scheduler in PyTorch called StepLR.

What Is StepLR and How Does It Work?

The StepLR scheduler is a simple but effective way to adjust the learning rate as your model trains. In PyTorch, StepLR reduces the learning rate by a certain factor every fixed number of epochs. This helps the model make big updates early on and then smaller, more careful updates as it gets closer to a good solution.

The two main parameters for StepLR are step_size and gamma. The step_size tells the scheduler how many epochs to wait before reducing the learning rate. The gamma parameter is the factor by which the learning rate is multiplied each time it is reduced. For example, if your initial learning rate is 0.1, your step_size is 10, and your gamma is 0.1, then after 10 epochs, the learning rate will become 0.01.

How to Choose `step_size` and `gamma`

Choosing the right values for step_size and gamma depends on your dataset, model, and training dynamics:

step_size: A smaller step_size (e.g., 5) means the learning rate will drop more frequently, which can help if your model quickly reaches plateaus or if you want to fine-tune early. A larger step_size (e.g., 20 or 30) keeps the learning rate high for longer, which can be useful for larger datasets or more complex models that need more time to learn before fine-tuning.
gamma: A smaller gamma (e.g., 0.1) means the learning rate drops sharply, which can help the model converge quickly after the drop. A larger gamma (e.g., 0.5 or 0.7) results in a more gradual decrease, which can be useful if you want to avoid sudden changes in training dynamics.

Why 10 and 0.1?
In the example, step_size=10 and gamma=0.1 are chosen to demonstrate a clear, noticeable drop in the learning rate every 10 epochs. These are common starting points, but you should experiment with these values based on your model’s performance. If you notice your model stops improving before the next scheduled drop, try reducing step_size. If the learning rate drops too much and training stalls, try increasing gamma.

Integrating StepLR Into a PyTorch Training Loop (Code Example)

Let’s look at how you can use StepLR in a typical PyTorch training loop. Here is a code example that shows how to set up and use the scheduler:

Why is scheduler.step() called after optimizer.step()?
The order is important. optimizer.step() updates the model parameters using the current learning rate. scheduler.step() then updates the learning rate for the next epoch, based on the current epoch count. If you call scheduler.step() before optimizer.step(), the learning rate would be updated before the optimizer uses it, which can lead to off-by-one errors in your schedule. Always call optimizer.step() first, then scheduler.step().

If you print the learning rate at each epoch, you would see something like this (assuming the initial learning rate is 0.1):

Epoch	Learning Rate
1	0.1
10	0.1
11	0.01
20	0.01
21	0.001
30	0.001

This shows how the learning rate drops at epochs 11 and 21, following the schedule you set.

Summary and What’s Next

In this lesson, you learned why the learning rate is important and how changing it during training can help your neural network learn better. You were introduced to the StepLR scheduler in PyTorch, saw how it works, learned how to choose its parameters, and understood the importance of the order in the training loop.

Next, you will get a chance to practice using StepLR yourself. In the following exercises, you will set up a scheduler, run a training loop, and observe how the learning rate changes over time. This hands-on practice will help you become comfortable with learning rate scheduling and prepare you for more advanced optimization techniques in the rest of the course.

Next Lesson: Comparing SGD and Adam Optimizers in PyTorch

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal