Introduction

Welcome to Lesson 4 of our course "Navigating RL Challenges: Strategies and Future Directions"! So far, we've enhanced our grid world environment with random goals, implemented reward shaping to accelerate learning, and added mines as obstacles that our agent must learn to avoid.

In this lesson, we'll tackle one of the most fundamental challenges in Reinforcement Learning: designing effective state representations. The way we represent the environment state to our agent can dramatically impact learning efficiency, generalization ability, and overall performance. As we've seen in previous lessons, our agent has been receiving raw state information, such as the agent position and the goal position. While this approach works, it's not the most efficient representation for learning: Real-world RL applications often require careful feature engineering to help agents learn more effectively from the same experiences.

By the end of this lesson, we'll have implemented an improved state representation that extracts meaningful features from our Grid World environment, allowing our agent to learn more efficiently and generalize better across different scenarios.

The Importance of State Representation in RL

As we mentioned before, the observation (or state representation) is the agent's "actual view" of the environment — it's all the information the agent can use to make decisions. Poor state representations can lead to several problems:

  1. Slow learning: When states contain irrelevant information or aren't structured optimally, agents require more samples to learn effective policies.
  2. Limited generalization: Raw state coordinates don't generalize or scale well. For example, being at position (3,4) with a goal at (5,5) is conceptually the same as being at (1,1) with a goal at (3,2) — both require moving southeast.
  3. Large state spaces: Using raw coordinates creates a state space that scales with the environment size, making learning more difficult.
  4. Difficulty capturing relevant relationships: Important information like "am I moving toward the goal?" isn't explicit in raw coordinates.

A well-designed state representation should include only relevant information for decision-making, encode meaningful relationships and patterns, support generalization across similar situations, and be compact enough for efficient learning. In our grid world navigation task, instead of using raw coordinates, we can represent the state in terms of relationships between the agent, goal, and obstacles — information that directly informs the optimal policy.

Designing an Improved State Representation

Let's design an improved state representation for our grid world environment. We'll focus on four key elements that capture the essential information needed for effective navigation:

  1. Directional information: Instead of raw coordinates, we'll represent the direction to the goal as a compass direction (N, NE, E, SE, S, SW, W, NW). This creates a representation that's independent of the specific coordinates.
  2. Distance information: We'll include the normalized Manhattan distance to the goal. Normalizing by grid size makes this feature work consistently regardless of environment dimensions.
  3. Danger awareness: We'll retain information about mines in adjacent cells we introduced in the previous lesson, providing the agent with critical safety information.
  4. Boundary awareness: We'll include a simple indicator of whether the agent is at the edge of the grid, which can help the agent learn boundary constraints more quickly.

This representation offers significant advantages over raw coordinates. It's compact — using just a few values instead of raw coordinates; generalizable — the same representation works for goals in any location; informative — directly encodes the relationships that matter for decision-making; and consistent — works the same way regardless of grid size. By transforming the raw state into these more meaningful features, we're giving our agent a head start in understanding the environment. Let's see of an improved state representation might look (the agent is green, the mines in red):

Implementing the Improved Observation Function, Step 1

Now let's implement our improved state representation by creating a method that transforms the raw environment state into our more informative representation:

The first part of our implementation determines the relative direction to the goal. We calculate the direction vector (dr, dc) and then convert it to one of eight compass directions (or a ninth option when the agent is already at the goal). This makes the state representation invariant to the specific coordinates — the agent only needs to know which direction to move relative to its current position, a far more intuitive way to understand spatial relationships than raw coordinates.

Implementing the Improved Observation Function, Step 2

Let's continue with the rest of the implementation:

The completed implementation adds three more elements to our state representation. First, a normalized distance to the goal (divided by 2 × grid size), giving a value between 0 and 1 that works consistently regardless of grid dimensions. Second, information about nearby mines in the four adjacent cells, helping the agent avoid dangerous moves. Third, a binary indicator for whether the agent is at an edge of the grid, which helps the agent learn boundary constraints. All of these features are combined into a compact tuple that forms our improved state representation, giving the agent precisely the information it needs in a format that facilitates efficient learning.

Integrating the New Representation: Reset Method

With our improved observation function in place, we need to update the environment to use it throughout the learning process. Let's modify the reset and step methods to return our enhanced state representation:

In this updated reset method, instead of returning the raw state tuple (self.state, self.goal_state, mines_near) as in our previous implementation, we now call our new _get_improved_observation() method to get the enhanced representation. This ensures that from the very beginning of each episode, our agent receives the more informative state representation we've designed.

Integrating the New Representation: Step Method

Similarly, we update the step method to use our improved observation:

By replacing the raw state tuple with our improved observation in both the reset and step methods, we've fully integrated our enhanced state representation into the environment. This consistent use of the improved representation throughout the agent's interaction with the environment ensures that the agent always has access to the most relevant, information-rich view of its situation, enabling more efficient learning.

Learning with the Improved Representation

Now let's see how our agent learns using this enhanced state representation. The training process itself remains unchanged — we run episodes and collect statistics on rewards, success rate, and mine hits. The magic happens behind the scenes where our improved state representation makes learning more efficient. With the enhanced representation, our agent doesn't have to learn the complex relationship between raw coordinates and optimal actions from scratch — much of this information is already encoded in the representation itself.

Let's train and test the agent:

Running the above code results in the following output:

These impressive results — a perfect success rate of 100% and complete avoidance of mines — highlight the power of good state representation. By encoding relevant information in an efficient way, we've enabled our agent to learn a robust navigation policy in a complex environment with significantly fewer training episodes than would be required with raw coordinates.

Conclusion and Next Steps

In this lesson, we've explored the critical role that state representation plays in Reinforcement Learning effectiveness. By thoughtfully designing a representation that captures directional relationships, normalized distances, danger awareness, and boundary information, we've created a more efficient learning environment for our agent. This approach demonstrates a fundamental principle in applied Reinforcement Learning: incorporating domain knowledge into your state design can dramatically improve learning efficiency and performance, often making the difference between success and failure in complex environments.

As you continue your Reinforcement Learning journey, consider how you might design state representations for different types of problems. Each domain has its own set of relevant features and relationships that can be encoded to facilitate learning. In the practice exercises that follow, you'll have the opportunity to experiment with different state representations and observe their impact on learning performance, further solidifying your understanding of this crucial aspect of Reinforcement Learning design.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal