Early Stopping in PyTorch: Preventing Overfitting During Training

Introduction: Why Use Early Stopping?

Welcome back! In the last lesson, you learned how to use dropout to help your neural network generalize better and avoid overfitting. As a quick reminder, overfitting happens when your model learns the training data too well, including its noise, and then struggles to perform well on new, unseen data. Dropout is one way to address this, but it is not the only tool available. In this lesson, I will introduce you to another important technique: early stopping.

Early stopping is a simple but powerful method to prevent overfitting during training. Instead of training your model for a fixed number of epochs, you monitor its performance on a validation set and stop training when the model stops improving. This way, you avoid wasting time and resources on training that does not help your model get better — and you also reduce the risk of overfitting. Early stopping is widely used in deep learning and is especially helpful when you are not sure how many epochs your model really needs.

How Early Stopping Works

The main idea behind early stopping is to keep an eye on your model’s performance on a validation set during training. After each epoch, you check the validation loss — a measure of how well your model is doing on data it has not seen before. If the validation loss keeps getting better, you continue training. But if the validation loss stops improving for a certain number of epochs, called the patience, you stop training early.

Patience is a key parameter in early stopping. It tells your training loop how many epochs to wait for an improvement before giving up. For example, if patience is set to 5 and the validation loss does not improve for 5 epochs in a row, training will stop. This helps you avoid stopping too soon if there is a small bump in the loss, but also prevents you from training for too long when there is no real progress.

Best Model vs. Last Model

One important detail to keep in mind: the model parameters at the end of training (the "last model") may not correspond to the best performance on the validation set. Often, the best model (the one with the lowest validation loss) occurs several epochs before training actually stops. If you only use the model as it is at the end of training, you might not get the best results.

To address this, you should save a copy of the model’s parameters whenever a new best validation loss is achieved. After early stopping triggers, you can reload these saved parameters to ensure you are using the best version of your model.

Example: Adding Early Stopping and Saving the Best Model in PyTorch

Let’s look at how you can add early stopping to your PyTorch training loop and also save the best model. Below is a code example that shows a typical way to implement early stopping and model checkpointing. This code assumes you already have your model, optimizer, loss function, and data prepared, just like in the previous lessons.

Here’s how this code works. At the start, best_loss is set to infinity, so any real validation loss will be better. The patience variable is set to 5, which means the training will stop if the validation loss does not improve for 5 epochs in a row. The wait variable keeps track of how many epochs have passed since the last improvement. The best_model_state variable is used to store the parameters of the best model seen so far.

During each epoch, the model is trained on the training data as usual. After training, the model is evaluated on the validation set, and the validation loss is printed. If the validation loss is lower than the best loss seen so far, best_loss is updated, is reset to zero, and the model’s parameters are saved. If not, is increased by one. If reaches the value of , the loop prints "Early stopping!" and breaks out of the training loop.

Tuning Early Stopping

The most important parameter to tune in early stopping is the patience value. If you set patience too low, you might stop training before your model has a chance to recover from a small bump in the validation loss. If you set it too high, you might end up training for too long and risk overfitting. A good starting point is usually between 3 and 10 epochs, but the best value depends on your dataset and model.

You can also experiment with other stopping criteria, such as monitoring a different metric (like accuracy) or using a minimum change threshold to decide what counts as an improvement. For most cases, though, monitoring validation loss with a reasonable patience value works well.

Summary And Practice Preview

In this lesson, you learned how early stopping can help you train better neural networks by stopping training when your model stops improving on the validation set. You saw how to add early stopping to your PyTorch training loop, how the patience parameter works, and how to tune it for your needs. You also learned the importance of saving and restoring the best model parameters, so you always use the model that performed best during training. Early stopping is a simple but effective way to save time and avoid overfitting.

Next, you will get a chance to practice adding and adjusting early stopping in your own training loops. This hands-on experience will help you see how early stopping works in practice and how it can improve your models. When you are ready, move on to the practice exercises to apply what you have learned!

Previous Lesson

Next Lesson: Adding Batch Normalization to Neural Networks in PyTorch

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal