In the previous lesson, you built a stateless reducer agent, but there is a critical issue: the system prompt is hardcoded inside the Python code, and the raw context list is passed directly to the model without any formatting.
This means you cannot easily review or version your prompts, and the model sees technical noise like "type", "call_id", and "arguments" instead of clean, structured information. Now, we are going to implement Factor 2 (Own your prompts) and Factor 3 (Own your context window) by extracting your system prompt into a versioned markdown file and building a context serializer that transforms the raw context into clean, focused text that the model can process efficiently.
Before we dive into creating the prompt files and serializer, let's understand how we need to reorganize our project structure. We're going to add two new directories to separate concerns and make our codebase more maintainable.
We're going to extend the existing structure to include dedicated directories for prompts and utilities:
Here's what each new component adds to our architecture:
prompts/directory — Houses all prompt-related files as versioned markdown documents, treating them as first-class artifacts alongside your codebase_system.md— Defines the agent's role and behavioral requirements in a structured, reviewable formatcontext_format.md— Provides the template for how context should be presented to the model
utils/directory — Contains helper modules that don't fit into the core agent logiccontext_serializer.py— Transforms raw context dictionaries into the clean markdown format defined by our template
This structure separates three concerns: agent logic (the reducer loop and control flow), prompt definitions (what the agent should do and how it should see information), and utilities (supporting functions that transform data). By organizing files this way, you can modify prompts without touching Python code, review prompt changes in version control, and test different prompt versions independently. Now let's create these files and implement the externalization.
Following the structure we just planned, let's create the first prompt file at src/core/prompts/base_system.md. This file will define the agent's role and core requirements in a structured, readable format.
The # ROLE section defines the model's identity, while the # REQUIREMENTS section provides specific rules that guide behavior, including the critical instruction to call the final_answer tool when complete. The # EXTRA INSTRUCTIONS placeholder allows you to append custom instructions without modifying the base file.
This is significant because your prompt is now a versionable artifact — you can track changes over time, review prompt modifications in pull requests just as you review code, and deploy new prompt versions independently of code changes. Now, we need to update the Agent to load this prompt from the file.
We need to update the Agent class constructor to load the system prompt from the file instead of using an inline string. We will use Python's Path module to locate the file relative to the agent module, following the structure we planned.
The code uses Path(__file__).resolve().parent to locate the prompts directory relative to the current file, which makes the path portable across different environments. The read_text(encoding="utf-8") method loads the entire markdown file as a string, and we concatenate extra_instructions at the end to fill the placeholder section.
Your prompt is now externalized, versioned, and maintainable without touching Python code. Next, we need to control how the context is formatted before it reaches the model.
The raw context list contains dictionaries with technical keys that add noise for the model. We need to transform this into a clean, structured format that highlights what matters: the user's request and what actions have already been completed. This is the essence of context engineering — deliberately structuring information to guide model behavior. Let's create the second file in our planned structure: src/core/prompts/context_format.md.
This template has three sections: the user's original question using the {user_message} placeholder, a list of completed actions using the {execution_history} placeholder with an explicit warning not to repeat them, and a clear directive regarding what to do next.
The placeholders are designed for Python string formatting, which makes it easy to inject dynamic content. By structuring information this way, we remove noise and focus the model's attention on the relevant state — this is what Factor 3 means by owning your context window. Now, let's build the serializer that performs this transformation.
The context serializer is a function that takes the raw context list and transforms it into the formatted markdown text. This is where we implement our context engineering strategy. Following our planned structure, we'll create this in src/core/utils/context_serializer.py, starting by loading the template and defining the function signature.
We use Path to locate the template relative to the current file, navigating up to the core directory and then into prompts — exactly as we planned in our file structure. The template is loaded once at module import time and stored in the module-level constant _CONTEXT_TEMPLATE, making subsequent calls efficient. The function returns an empty string immediately if the context is empty. Now, we need to extract the user_message from the raw context.
The first piece of information we need is the user's original request. We will iterate through the context, looking for an item with "role": "user", and extract its content.
We loop through the context items, looking for one with "role": "user", and then extract its "content" field. This is the original question or request that started the conversation. We only need the first user message, so we break after finding it. Next, we need to format the execution_history to show what tools were called and what results they produced.
To format the execution_history, we first need to build a map of function calls indexed by their call_id. This will allow us to look up the original function call when we encounter its output later.
We iterate through the context looking for items with "type": "function_call" and extract the call_id, call_name, and arguments. The arguments might be a JSON string, so we try to parse them into a dictionary using json.loads.
If parsing succeeds and we have a dictionary, we format the arguments as readable key-value pairs like a=5, b=1. We then store the formatted function signature in , creating mappings like . With this map ready, we can now format the completed actions.
Now, we can format the completed actions by pairing function calls with their outputs using the call_map we just built.
We iterate through the context again, this time looking for "function_call_output" items. For each output, we extract its call_id and output, then look up the formatted function call in our call_map.
We create a line that combines the call signature and the result, prefixed with a checkmark to indicate completion. All lines are joined with newlines to create the execution_history string; alternatively, we use a placeholder message if no actions have been completed yet. Now, we can fill the template with the extracted values.
With both the user_message and execution_history extracted and formatted, we can now inject them into the template and return the final result.
The format() method replaces the {user_message} and {execution_history} placeholders in the template with the actual values we extracted and formatted. The result is a clean markdown document that presents the relevant information in a structured, readable way without technical noise from the raw dictionaries. Now, let's integrate this serializer into the Agent.
With the serializer built, we need to update the Agent's _call_llm method to use it. Instead of passing the raw context to the model, we serialize it first and pass the formatted text.
The only change is that we call serialize_context_to_text(context) to transform the raw context into formatted markdown, then pass that as the input parameter. Now, the model receives clean, structured text that highlights the user request and completed actions, implementing Factor 3 by explicitly controlling what goes into the context window.
To understand the impact of Factor 3, imagine the agent is halfway through a task. Instead of the LLM receiving an internal list of JSON-like dictionaries, the serialize_context_to_text function provides a structured prompt. For example, after the agent has calculated the square of a number, the model sees this:
By presenting the state as a markdown document, we guide the model's attention toward the logic of the solution rather than the technical plumbing of the tool-calling loop.
Now let's update main.py to run our agent with its new capabilities. We will initialize the context and let the agent run its reducer loop until it reaches a final answer.
By printing the raw context at the end, we can verify the full audit trail of everything the agent did internally, even though the model only saw the clean, serialized versions during the process.
When we run the complete Agent from main.py, the output shows the agent successfully completing the task while maintaining a detailed internal log.
The Agent completed successfully, solving the quadratic equation by using its math tools to calculate the discriminant and find the roots. The raw context shown above is what we maintain internally for full auditability, keeping track of every call_id and raw dictionary.
Even though we print the raw context for our own records, it is important to remember what the model actually saw during its final step:
This format removes technical noise like call_id and type, showing only the user's request, completed actions with results, and a clear next step directive. Notice that final_answer doesn't appear in the serialized history — the serializer is designed to display actions that produced an output, and final_answer terminates the loop before an output is generated. This separation gives us complete records for debugging while providing focused information to the model.
You have now implemented Factor 2 and Factor 3 by extracting your system prompt to a versioned markdown file and building a context serializer that controls exactly what the model sees through context engineering.
Your prompts are now first-class artifacts that can be reviewed, tested, and deployed independently of code, while your context is explicitly curated to remove noise and present information in a structured format. This context engineering approach gives you precise control over model behavior by shaping the information it receives. In the upcoming practice exercises, you will extend these concepts by modifying the templates to experiment with different formatting styles and exploring how changes to the serialized input affect model behavior.
