Introduction to Building LSTMs for Time Series Forecasting

Welcome to the next step in your journey through the "Time Series Forecasting with LSTMs" course. In this lesson, we will focus on building and training an LSTM model specifically for time series forecasting using the univariate "Airline Passengers" dataset. As you may recall from the previous lesson, LSTMs are particularly adept at capturing temporal dependencies in sequence data, making them ideal for this task. Our goal is to guide you through the process of constructing an LSTM model that can effectively forecast future values based on historical data.

Understanding the LSTM Model Architecture

Before we dive into the code, let's take a moment to understand the architecture of the LSTM model we will be building. The model consists of several key components:

  • Input Layer: This layer defines the shape of the input data. In our example, the input shape is determined by the sequence length and the number of features. For the airline passengers dataset, we will use a sequence length of 10 and 1 feature (the number of passengers).

  • LSTM Layers: Our model includes two LSTM layers, each with 16 units. The choice of 16 units is a balance between model complexity and computational efficiency. Fewer units can reduce the risk of overfitting and require fewer computational resources while still capturing essential patterns in the data. These layers are responsible for capturing the temporal dependencies in the data. In PyTorch, the default activation function for LSTM layers is tanh, which helps the model learn complex patterns.

  • Dense Output Layer: The final layer is a fully connected layer with a single unit. This layer produces the forecasted value based on the learned patterns from the LSTM layers.

Understanding these components will help you grasp how the model processes the input data to generate forecasts.

Preparing the Data: Chronological Train-Test Split

For time series forecasting, it is important to preserve the temporal order of the data when splitting into training and testing sets. Instead of a random split, we use a chronological split to ensure that the model is trained on past data and tested on future data. Here’s how you can perform a chronological train-test split:

This approach ensures that the training set contains the earlier time steps and the test set contains the later time steps, which is essential for realistic time series forecasting.

Building the LSTM Model: Step-by-Step Example

Now, let's walk through the process of building the LSTM model. We will use the PyTorch library to define and compile the model. Here's the code:

In this code, we define an LSTMModel class that inherits from nn.Module. The model consists of an LSTM layer followed by a fully connected layer. The forward method defines how the input data flows through the model. We initialize the hidden and cell states to zeros and pass the input through the LSTM layer, followed by the fully connected layer to produce the forecasted value.

Training the LSTM Model

With the model defined, the next step is to train it using a training loop. Training involves adjusting the model's weights based on the input data to minimize the loss function. Here's how you can train the model:

In this code, we define the mean squared error loss function and the Adam optimizer. The training loop iterates over the dataset in batches, performing forward and backward passes to update the model's weights. The zero_grad method clears the gradients, backward computes the gradients, and step updates the weights.

Making Predictions with the LSTM Model

After training the model, you can use it to make predictions on new data. Here's how you can obtain the predictions:

In this code, we set the model to evaluation mode using model.eval() to disable dropout and batch normalization layers if present. We then convert the test data to a tensor and pass it through the model to obtain the predictions.

Evaluating the Model with RMSE

After obtaining the predictions, it's important to evaluate the model's performance. One common metric for time series forecasting is the Root Mean Square Error (RMSE), which provides a measure of the differences between predicted and actual values. To accurately calculate RMSE, you need to rescale the predictions and actual values back to their original scale if they were normalized or standardized before training. Here's how you can calculate RMSE and compare it with the feature range:

In this code, we first rescale the predictions and actual values using the inverse transformation of the scaler used during preprocessing. We then use the mean_squared_error function from sklearn.metrics to calculate the mean squared error between the rescaled actual values y_test_rescaled and the model's rescaled predictions predictions_rescaled. We take the square root of this value to obtain the RMSE. To assess the RMSE, we calculate the range of the feature values and compare the RMSE to a percentage of this range. If the RMSE is less than 10% of the feature range, it indicates good model performance. Otherwise, there may be room for improvement.

Summary and Preparation for Practice Exercises

In this lesson, we focused on building and training an LSTM model for time series forecasting using the univariate airline passengers dataset. We explored the model's architecture, including the input layer, LSTM layers, and fully connected output layer. You learned how to define and train the model using PyTorch, and how to make predictions with the trained model. Additionally, we discussed how to evaluate its performance using RMSE. As you move on to the practice exercises, I encourage you to apply what you've learned by building and training your own LSTM models. Experiment with different parameters and datasets to deepen your understanding and improve your forecasting skills. This hands-on practice will solidify the concepts covered in this lesson and prepare you for more advanced topics in the course.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal