Welcome to the first lesson of the "Improving Neural Networks with PyTorch" course. In this course, you will learn practical ways to make your neural networks perform better and avoid common pitfalls. We start with one of the most important steps in any machine learning project: evaluating your model. Evaluation helps you understand how well your model is learning and whether it is likely to perform well on new, unseen data. This is especially important in deep learning, where models can easily become too complex and start to "memorize" the training data — a problem known as overfitting.
Overfitting happens when a model learns the training data too well, including its noise and outliers, and as a result, performs poorly on new, unseen data. The model essentially "memorizes" the training set instead of learning general patterns. On the other hand, underfitting occurs when a model is too simple to capture the underlying structure of the data, resulting in poor performance on both the training and validation sets.
In this lesson, you will learn how to set up a simple neural network using PyTorch, train it on a dataset, and evaluate its performance using a validation set. This foundation will prepare you for more advanced techniques in later lessons, such as dropout, early stopping, and batch normalization.
Before you can train a neural network, you need to prepare your data. In this example, we will use scikit-learn to generate a synthetic classification dataset. This is a common approach for learning and testing, as it allows you to focus on the model itself without worrying about data collection.
First, we use make_classification
from scikit-learn to create a dataset with 1,000 samples and 20 features. The features are then scaled using StandardScaler
, which is important because neural networks often train better when input features are on a similar scale. After scaling, we split the data into training and validation sets using train_test_split
. The training set is used to fit the model, while the validation set helps us check how well the model is doing on data it hasn't seen before.
In real-world scenarios, it's helpful to ensure reproducibility when splitting the data. You can do this by setting the random_state
parameter in train_test_split
. This way, every time you run the code, you get the same split between training and validation sets.
Since PyTorch models work with tensors, we convert the NumPy arrays from scikit-learn into PyTorch tensors. For binary classification, the target labels are reshaped to have a single column, which matches the output of our neural network.
Here is the code for preparing the data:
After running this code, you will have your data ready for training and evaluation in PyTorch.
Now that the data is ready, let's build a simple neural network using PyTorch. We will use a Multi-Layer Perceptron (MLP), which is a basic type of feedforward neural network. In this example, the network has one hidden layer with 64 units and uses the ReLU
activation function. The output layer uses a sigmoid
activation, which is standard for binary classification tasks.
Let's break down the model architecture and some key design choices:
-
ReLU Activation Function: The hidden layer uses the ReLU (Rectified Linear Unit) activation function. ReLU introduces non-linearity into the model, which allows it to learn complex patterns in the data. It is also computationally efficient and helps avoid the vanishing gradient problem that can occur with other activation functions like sigmoid or tanh.
-
Model as a Class Inheriting from
nn.Module
: The model is defined as a class that inherits fromnn.Module
. This is the standard way to build models in PyTorch. Inheriting fromnn.Module
gives you access to useful methods and makes it easy to manage model parameters, move the model between devices (CPU/GPU), and save or load the model. -
The
forward
Method: Theforward
method defines how the input data flows through the network layers. When you call the model on an input (e.g.,model(X_train)
), PyTorch automatically calls theforward
method to compute the output.
Here is the code for the model:
Evaluating your model as it trains is key to understanding how well it is learning. After each training epoch, we switch the model to evaluation mode and compute the loss on the validation set. This helps you see if the model is improving and whether it might be starting to overfit.
It's important to monitor both the training loss and the validation loss during training. Printing only the validation loss is not sufficient for understanding model performance. By tracking both, you can see if the model is overfitting (training loss decreases while validation loss increases) or underfitting (both losses remain high).
- Overfitting: Training loss keeps decreasing, but validation loss starts to increase. The model is learning the training data too well and not generalizing.
- Underfitting: Both training and validation loss remain high. The model is too simple or not trained enough to capture the underlying patterns in the data.
In evaluation mode, we use torch.no_grad()
to avoid tracking gradients, which saves memory and computation. The training loss and validation loss are printed at each epoch, so you can monitor progress.
Here is the relevant part of the code:
The output will look something like this:
A decreasing training and validation loss means the model is learning. If the validation loss starts to increase while the training loss keeps decreasing, it may be a sign of overfitting.
Let's put everything together. Here is the complete script that prepares the data, builds the model, trains it, and evaluates its performance by printing both the training and validation loss at each epoch:
When you run this script, you will see both the training loss and validation loss printed for each epoch. This gives you a clear view of how your model is performing as it learns.
In this lesson, you learned how to prepare data, build a simple neural network in PyTorch, train it, and evaluate its performance using a validation set. Monitoring both training and validation loss is a key step in making sure your model is learning the right patterns and not just memorizing the training data. You also learned about the problems of overfitting and underfitting, and how to spot them by looking at the loss values. This foundation is essential for improving your models and will help you understand more advanced techniques in the next lessons.
You are now ready to move on to the practice exercises, where you will get hands-on experience with these concepts. This will help you build confidence and prepare for the next steps in improving your neural networks with PyTorch.
