Introduction to Time Series Forecasting with GRUs

Welcome to the first lesson of the course on Time Series Forecasting with GRUs. In this lesson, we will explore the concept of time series forecasting, which involves predicting future values based on previously observed data points. This is particularly useful in fields like finance, weather prediction, and air quality monitoring. We will focus on using Gated Recurrent Units (GRUs), a type of recurrent neural network, to handle multivariate time series data. GRUs are well-suited for this task because they can capture temporal dependencies in data efficiently.

To make our learning concrete, we will use the UCI Air Quality dataset. This dataset contains various air quality measurements, and we will use it to predict future temperature values. By the end of this lesson, you will understand how to preprocess data, build a GRU model, and train it for forecasting.

Data Processing Recap

Before we dive into building our GRU model, let's quickly recap the data processing steps. We will use the UCI Air Quality dataset, which is available at UCI Machine Learning Repository. This dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The dataset includes various air quality measurements such as CO, NO2, and O3 levels, as well as temperature and relative humidity.

You can load the dataset using the following code:

In this code, we load the dataset, handle missing values, and normalize the data. We then create sequences of data points to serve as input for our GRU model. This preprocessing is crucial for ensuring that our model can learn effectively from the data. Great idea! Adding just a bit of math can help reinforce the concepts without overwhelming learners — especially if they’re already comfortable with simple formulas. Here’s your lesson updated with lightweight math expressions that tie directly to the diagram and explanation.

Understanding the GRU Structure

Before we build our GRU model, it’s important to understand how a Gated Recurrent Unit (GRU) works. GRUs are a type of recurrent neural network (RNN) designed to handle sequential data — such as time series or natural language — by learning patterns over time.

What makes GRUs effective is their ability to control how information flows through the network using two gates: the update gate and the reset gate. These gates allow the model to selectively keep or discard information from the past, helping it focus on what matters most in each step of the sequence.

GRUs are often preferred over LSTMs for their simpler architecture, which leads to faster training while still delivering strong performance.

Here is a diagram of a GRU cell. In the next section, we’ll explore the different parts of this diagram to understand how the GRU processes information.

Inside the GRU Cell

Each GRU cell receives two inputs:

  • xtx_t: the input at the current time step
  • ht1h_{t-1}: the hidden state from the previous time step (which carries memory)

Update Gate

The update gate determines how much of the previous hidden state to retain. It is computed as:

zt=σ(Wzxt+Uzht1)z_t = \sigma(W_z x_t + U_z h_{t-1})

If ztz_t is close to 1, the model keeps most of the past. If it’s close to 0, it focuses more on the current input.

Reset Gate

The reset gate controls how much of the past to forget when computing new information:

rt=σ(Wrxt+Urht1)r_t = \sigma(W_r x_t + U_r h_{t-1})

A low value of rtr_t allows the model to “reset” and ignore the past, which is useful for capturing short-term patterns.

Candidate Hidden State

Using the reset gate, the model calculates a candidate hidden state:

h~t=tanh(Wxt+U(rtht1))\tilde{h}_t = \tanh(W x_t + U (r_t \odot h_{t-1}))

Here, \odot denotes element-wise multiplication (not a dot product). This candidate state mixes new input with a gated version of past memory.

Final Hidden State

The final output of the GRU cell is a blend of the previous hidden state and the candidate:

ht=ztht1+(1zt)h~th_t = z_t \odot h_{t-1} + (1 - z_t) \odot \tilde{h}_t

This allows the GRU to choose between keeping the old information or updating it with new content.

Why Use Tanh?

The GRU uses the tanh activation function to create the candidate hidden state. The tanh function outputs values in the range between -1 and 1:

tanh(x)=exexex+ex\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}

This helps the GRU model:

  • Center outputs around zero, making learning more stable
  • Capture both positive and negative signals
  • Avoid saturation issues that occur in activations like sigmoid (which only outputs 0 to 1)

The shape of the tanh curve is S-like (similar to sigmoid), but centered at 0. This allows the GRU to generate a more expressive intermediate representation.

Building a GRU Model for Multivariate Forecasting

Now that we have our data and understand how GRUs work, let's build a GRU model using TensorFlow and Keras. GRUs are a type of recurrent neural network that can capture temporal dependencies in sequential data. Here's how you can define a GRU model:

In this model, we use two GRU layers. The first GRU layer has 32 units and returns sequences, which means it outputs a sequence of data points for each input sequence. The second GRU layer has 16 units and does not return sequences, as it outputs a single data point. Finally, we use a Dense layer with one unit to predict the temperature. Our features include 'CO(GT)', 'NO2(GT)', 'PT08.S5(O3)', 'RH', and 'T', and our target is the temperature ('T'), which we aim to forecast.

Training the GRU Model

With our model defined, we can now compile and train it. Compiling the model involves specifying the optimizer and loss function. We will use the Adam optimizer and mean squared error (MSE) as the loss function. Here's how you can train the model:

In this code, we compile the model by specifying the optimizer and loss function:

  • Optimizer: We use the Adam optimizer, which is an algorithm that adjusts the model's weights to minimize the loss function. It determines how the model learns from the data by updating weights during training. Adam is popular due to its efficiency and ability to handle sparse gradients.

  • Loss Function: We use mean squared error (MSE) as the loss function, which measures how well the model's predictions match the actual data. It quantifies the difference between predicted and true values, guiding the optimizer in adjusting the model to improve accuracy. MSE is commonly used for regression tasks as it penalizes larger errors more heavily.

We then train the model using the fit method, specifying the number of epochs (the number of times the model will iterate over the entire dataset) and the batch size (the number of samples processed before the model is updated). Training the model allows it to learn patterns in the data and make accurate predictions.

Comparing RNN, LSTM, and GRU

To better understand the differences between basic RNNs, LSTMs, and GRUs, let's compare their key features in the table below:

FeatureBasic RNNsLSTMsGRUs
ArchitectureSimple recurrent structureMemory cells with input, output, and forget gatesSimplified version of LSTM with update and reset gates
Handling Long-Term DependenciesLimited due to vanishing gradient problemEffective with memory cells and gatesEffective with fewer gates and simpler structure
ComplexityLowHigh due to multiple gatesModerate, fewer gates than LSTM
Training TimeFastSlower due to complexityFaster than LSTM, slower than basic RNN
PerformanceLimited for long sequencesHigh for capturing long-term dependenciesComparable to LSTM, often preferred for efficiency
Use CasesShort sequences, simple tasksTasks requiring long-term contextEfficient for a wide range of tasks, especially with limited resources

In summary, while basic RNNs are suitable for simple tasks with short sequences, LSTMs and GRUs are better equipped to handle long-term dependencies. GRUs offer a good balance between complexity and performance, making them a popular choice for many time series forecasting applications.

Summary and Next Steps

In this lesson, we introduced the concept of time series forecasting with GRUs and walked through the process of building and training a GRU model for multivariate forecasting. We covered data preprocessing, model construction, and training, providing you with a solid foundation for using GRUs in time series forecasting.

As you move on to the practice exercises, you'll have the opportunity to apply what you've learned and experiment with different configurations. In future lessons, we will explore how to evaluate the performance of our GRU model and delve into advanced GRU techniques to enhance forecasting accuracy. Keep up the great work, and let's continue to build on this foundation!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal