Adding Batch Normalization to Neural Networks in PyTorch

Introduction: Why Use Batch Normalization?

Welcome back! So far, you have learned how to improve your neural networks using dropout and early stopping. These techniques help your models generalize better and avoid overfitting. In this lesson, I will introduce you to another important tool: batch normalization.

Batch normalization is a method that helps neural networks train faster and more reliably. When training deep networks, the distribution of each layer’s inputs can change as the model learns. This is called “internal covariate shift,” and it can slow down training or make it harder for the model to converge. Batch normalization addresses this by normalizing the inputs to each layer, making the training process more stable and often allowing you to use higher learning rates. This technique is widely used in modern deep learning and can make a big difference in both training speed and final model performance.

How Batch Normalization Works

The main idea behind batch normalization is to normalize the activations of a layer for each mini-batch during training. This means that for each batch of data, the layer’s outputs are adjusted to have a mean of zero and a standard deviation of one. After this normalization, the layer also learns two new parameters: one to scale the normalized values and one to shift them. This allows the network to still represent the original data if needed, but with the added benefit of more stable and faster training.

In practice, batch normalization layers are usually placed after a linear (or fully connected) layer and before the activation function. For example, if you have a sequence of a linear layer followed by a ReLU activation, you would insert the batch normalization layer between them. This placement helps ensure that the activations going into the non-linearity are well-behaved, which can make the network easier to train.

Adding Batch Normalization to a PyTorch Model

In PyTorch, adding batch normalization to your model is straightforward. You use the nn.BatchNorm1d layer for fully connected networks (like multilayer perceptrons, or MLPs). The number you pass to nn.BatchNorm1d should match the number of features coming out of the previous linear layer.

For example, if your linear layer outputs 64 features, you would use nn.BatchNorm1d(64). You place this batch normalization layer right after the linear layer and before the activation function. This is important because batch normalization works best when it normalizes the raw outputs of the linear transformation before any non-linear activation is applied.

Note: If you are working with image data and convolutional neural networks (CNNs), you should use nn.BatchNorm2d, which is designed for 2D feature maps (such as those produced by convolutional layers). For 3D volumetric data (like medical images or video), use nn.BatchNorm3d. The number you pass to these layers should match the number of channels (feature maps) output by the previous convolutional layer.

On CodeSignal, PyTorch is already installed for you, so you do not need to worry about setup. However, if you are working on your own device, you would need to import PyTorch as usual:

Example: MLP with Batch Normalization

Let’s look at a concrete example of how to add batch normalization to a simple MLP in PyTorch. Here is a model that uses batch normalization after the first linear layer:

In this example, the model starts with a linear layer that takes 20 input features and outputs 64. Immediately after this, a batch normalization layer is added with nn.BatchNorm1d(64). This normalizes the 64 outputs from the linear layer for each mini-batch. After normalization, the ReLU activation is applied, followed by another linear layer and a sigmoid activation for binary classification.

By adding batch normalization, you help the model train more smoothly. You may notice that the training loss decreases more quickly, and the model is less sensitive to the choice of learning rate. The output of the model will still be a value between 0 and 1, just like before, but the training process should be more stable.

Summary and Practice Preview

In this lesson, you learned what batch normalization is and why it is a valuable tool for training neural networks. You saw how batch normalization works by normalizing activations within each mini-batch, and you learned where to place batch normalization layers in your model. You also walked through a practical example of adding batch normalization to a PyTorch MLP.

Next, you will get a chance to practice adding batch normalization to your own models. This hands-on experience will help you see the benefits of batch normalization in action and give you more confidence in building and improving neural networks. When you are ready, move on to the practice exercises to apply what you have learned!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal