Loading...

Introduction

Welcome to the third and final lesson of our "Q-Learning Unleashed: Building Intelligent Agents" course! In our previous lessons, we've explored the fundamentals of Q-learning, including how agents learn values for state-action pairs and update their knowledge through interactions with an environment.

So far, we've primarily focused on how Q-learning works and the mathematical foundation behind it. Today, we'll focus on mastering two more crucial aspects:

How to encapsulate our Q-learning algorithm in a well-structured class;
How to use a trained Q-table to make intelligent decisions.

By the end of this lesson, we'll have a complete, reusable Q-learning agent that can navigate environments based on learned knowledge. Let's dive in!

Creating an Object-Oriented Q-Learning Agent

One of the best practices in Reinforcement Learning (and in Machine Learning in general) is to encapsulate our algorithms into classes. This approach provides several benefits:

Organization: Keeps related data and functionality together.
Reusability: Makes it easy to use the agent in different environments.
Extensibility: Allows for straightforward modifications or enhancements.

Let's create a QLearningAgent class that will house our Q-learning algorithm. This class will feature a __init__ constructor method, a learn method that can be used to update the Q-table based on experience, and an act method that is used for decision-making. Starting with the constructor:

Using a defaultdict is particularly convenient here because we don't need to explicitly check if a state exists in our Q-table before accessing it. If we try to access a state that doesn't exist yet, it will automatically be created with all Q-values initialized to zero.

This approach simplifies our code and allows us to focus on the core learning algorithm rather than data structure management.

Learning from Experience

Next, we add the learning procedure in a learn method:

This method implements the Q-learning update rule that we've already discussed extensively in the previous lessons:

$Q(s,a) \leftarrow Q(s,a) + \alpha [r + \gamma \max_{a'} Q(s',a') - Q(s,a)]$

Making Decisions with Q-Tables

The agent makes decisions by consulting its Q-table, which contains estimated values for each action in every state. When the agent needs to act, it looks up all possible actions for its current state and selects the one with the highest Q-value, called the greedy action: this highest-valued action represents the agent's best current understanding of which choice will maximize its total future rewards. In essence, the Q-table serves as the agent's "brain" - encoding what it has learned about the long-term consequences of different decisions in different situations.

Let's implement this functionality via an act method in our QLearningAgent class:

This method:

Finds the index of the action with the highest Q-value using np.argmax;
Returns both the action itself and the index of that action.

Visualizing the Agent's Strategy

Once our agent has learned Q-values, we can extract and visualize the policy — the strategy that the agent follows to reach the goal. A policy maps each state to the best action to take in that state.

Let's implement a function to visualize the policy derived from our Q-table:

This visualization:

Shows arrows pointing to the best direction to move from each position
Labels each position with its state number
Displays the actual Q-values for each position and action pair

This type of visualization is incredibly useful for understanding what the agent has learned and for debugging purposes. It transforms abstract numbers into a meaningful strategy that we can interpret at a glance.

Implementing Policy Visualization

Now let's see how we can combine all these components to create and use a Q-learning agent with pre-defined Q-values. In a real application, these Q-values would be learned through training, but for demonstration purposes, we'll initialize them directly:

In this example:

We create a line-world with 10 positions and set the goal at position 5
We initialize an agent with the possible actions (-1 for left, 1 for right)
We manually set the Q-values to create a sensible policy:
- If the agent is to the left of the goal, moving right has a higher value
- If the agent is to the right of the goal, moving left has a higher value
We display the resulting policy and Q-values

The output shows arrows pointing toward the goal from every position, exactly as we would expect! This demonstrates that our agent knows the correct direction to move from any state to reach the goal.

Conclusion and Next Steps

Congratulations on completing the final lesson of our "Q-Learning Unleashed: Building Intelligent Agents" course!

In this final lesson, we've explored how to structure a Q-learning agent using object-oriented programming, enabling us to encapsulate the algorithm in a reusable and organized manner. We also learned how to use a Q-table to make intelligent decisions within an environment and how to extract and visualize a policy from the learned Q-values. These skills are foundational for applying Q-learning to various problems beyond our simple line-world example.

As you move forward to the practice exercises, you'll gain hands-on experience using and extending the Q-learning agent we've built, solidifying your understanding and preparing you for more complex environments. Remember, Reinforcement Learning is both a science and an art, requiring experimentation and intuition to create effective agents. Keep exploring, and happy learning!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal