Adding Dropout to Neural Networks in PyTorch

Introduction: Why Use Dropout?

Welcome back! In the last lesson, you learned how to build, train, and evaluate a simple neural network in PyTorch. You also saw how to monitor validation loss to check for overfitting — a common problem where a model performs well on training data but struggles with new, unseen data. As a quick reminder, overfitting happens when a model learns the training data too well, including its noise and random details, which makes it less effective on real-world data.

To address overfitting, one of the most popular and easy-to-use techniques is called dropout. Dropout is a regularization method that helps your neural network generalize better, so it can perform well not just on the training data but also on new data it has never seen before. In this lesson, you will learn what dropout is, how it works, and how to add it to your PyTorch models.

How Dropout Works

The main idea behind dropout is simple but powerful. During training, dropout randomly "drops out" (sets to zero) a fraction of the neurons in a layer on each forward pass. This means that each time the model sees a batch of data, it uses a slightly different set of neurons. As a result, the network cannot rely too much on any single neuron and is forced to learn more robust features.

Typically, dropout is applied after the activation function of hidden layers, not on the input or output layers. Applying dropout after the activation function is preferred because the activation function introduces non-linearity and transforms the raw outputs of the neurons. By applying dropout after this transformation, you are zeroing out the actual activated outputs (the features that are passed to the next layer), which is more consistent with the intended effect of dropout: to prevent the network from relying too much on specific activated features. If dropout were applied before the activation, it would zero out the raw, unactivated values, which could change the distribution of inputs to the activation function in unpredictable ways. Empirically, applying dropout after the activation has been shown to work better and is the standard practice.

The most common dropout rate is 0.5, which means that half of the neurons are randomly dropped during each training step. During evaluation (when you are testing or using the model), dropout is turned off, and all neurons are used.

This simple trick helps prevent the network from becoming too specialized to the training data, making it more likely to perform well on new data.

Adding Dropout to a PyTorch Model

Adding dropout to your PyTorch model is straightforward. PyTorch provides the nn.Dropout layer, which you can insert into your model just like any other layer. The dropout layer takes one argument: the dropout rate, which is the fraction of neurons to drop during training. For example, nn.Dropout(0.5) means that each neuron has a 50% chance of being set to zero during each forward pass in training mode.

You usually place the dropout layer after an activation function in a hidden layer. In PyTorch, if you are using nn.Sequential to build your model, you can simply add nn.Dropout between layers. The dropout layer will only be active during training; when you switch the model to evaluation mode with model.eval(), dropout is automatically disabled.

Example: MLP with Dropout

Let’s look at a concrete example. Below is a modified version of the multilayer perceptron (MLP) you saw in the previous lesson. This time, we add a dropout layer after the first hidden layer’s activation function. The dropout rate is set to 0.5, which is a common starting point.

In this code, the MLPDropout class defines a neural network with one hidden layer of 64 units, followed by a ReLU activation. Immediately after the activation, we add a dropout layer with a rate of 0.5. The output layer remains the same, using a sigmoid activation for binary classification. When you train this model, PyTorch will randomly drop half of the neurons in the hidden layer on each forward pass. When you evaluate the model, dropout is automatically turned off, and all neurons are used.

If you print the model, you will see the structure:

This output confirms that the dropout layer is correctly placed after the activation in the hidden layer.

Summary and Practice Preview

In this lesson, you learned how dropout helps prevent overfitting by randomly turning off neurons during training, making your neural network more robust and better at generalizing to new data. You also saw how easy it is to add a dropout layer to your PyTorch models using nn.Dropout. The key change is to insert the dropout layer after the activation function in your hidden layers.

Next, you will get a chance to practice adding dropout to neural networks yourself. This hands-on experience will help you understand how dropout works in practice and how it can improve your models. When you are ready, move on to the practice exercises to apply what you have learned!

Previous Lesson

Next Lesson: Early Stopping in PyTorch: Preventing Overfitting During Training

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal