Introduction

Welcome to Lesson 3 of Codex Subagents & Multi-Agent Orchestration! In our previous lessons, we established the fundamentals of subagent isolation with strict contracts and then scaled up to parallel execution for concurrent analysis. These patterns enable fast, reliable, tightly scoped operations across a codebase.

This lesson introduces a fundamentally different pattern: Multi-Stage Pipelines with Gated Writes. Here, we chain multiple subagents sequentially, where each stage validates the previous one before proceeding. The final stage performs controlled write operations with strict safety checks. By the end, you will understand how to build pipelines that analyze, plan, and modify code while maintaining rigorous guardrails against unintended changes.

The Case for Sequential Stages

When dealing with code modifications, running a single subagent with write permissions carries risks. Without structure, an agent might change unexpected files, introduce bugs, or stray beyond its intended scope. A multi-stage approach mitigates these risks by separating concerns.

Consider a pipeline that writes unit tests. First, we identify which code lacks coverage. Next, we design a test plan based on those findings. Finally, we implement the tests. Each stage produces artifacts that feed into the next, creating a clear audit trail. If any stage fails or produces questionable results, we halt before making changes.

This staged approach offers several advantages: we can review intermediate outputs, each stage has a narrow responsibility, and failures are caught early, before write operations occur. The result is a more predictable, safer automation workflow.

Pipeline Architecture Overview

Our pipeline consists of three sequential stages, each with distinct responsibilities. Stage A analyzes existing code to identify areas needing tests. Stage B generates a detailed test plan using Stage A's findings. Stage C implements the tests according to the plan, with strict validation afterward.

Between stages, we enforce gating logic: Each stage must report "status": "success" for the pipeline to continue. If Stage A or Stage B fails, we halt immediately rather than proceeding to write operations. After Stage C completes, we run validation checks on the modified files, ensuring changes stay within allowed boundaries. This combination of sequential dependencies and post-write validation creates a robust safety framework.

Defining Safety Boundaries

Before building the pipeline stages, we establish explicit safety boundaries as constants:

These constants define our safety perimeter. The ALLOWED_PREFIXES tuple specifies which directories are permissible for modification; any file outside these paths will trigger an error. The MAX_CHANGED_FILES limit prevents scope creep by rejecting changes that touch too many files simultaneously.

Hardcoding these constraints at the module level makes our safety policy transparent and easy to audit. We can adjust these values as our confidence grows or as project requirements change, but having them explicit prevents accidental loosening of restrictions.

The Codex Execution Helpers

To run subagents, we use one helper for analysis stages and another for the write stage:

The codex_exec() helper is used for Stage A and Stage B. It keeps the familiar -a never approval behavior while still returning structured output.

For Stage C, we use codex_exec_full_auto(). This allows the implementation stage to actually carry out file edits. The safety model still depends on constrained prompts, validation after execution, and git-based auditing.

As in the previous unit, stdout is a JSONL event stream produced by codex exec --json, not the final contract object directly. So in this lesson we continue using the same extraction helper pattern you already implemented earlier to scan the event lines, find the completed , and parse its as the contract.

Stage A: Identifying Test Targets

The first stage analyzes code to find untested areas:

We start by ensuring an artifacts/ directory exists for storing stage outputs. The prompt instructs the subagent to analyze only packages/common/, keeping its scope narrow. The task is specific: find exactly three untested code paths. The JSON contract extends our familiar status and summary fields with a targets array containing file paths and justifications.

Notice the (analysis-only) comment: even though the helper is configured with workspace-write, this stage's prompt does not ask for modifications. The agent should inspect files and return analysis.

Parsing and Persisting Stage A

After receiving raw output from Stage A, we parse and validate it:

The require_json helper uses the same JSONL extraction pattern from the previous unit, then validates the schema. After the basic validation, we explicitly check for the targets field that this stage must return. We persist the parsed data to artifacts/stage_a.json, creating a permanent record of what the agent identified. This artifact serves multiple purposes: it documents the analysis, enables debugging, and feeds into Stage B.

The critical part is the gate check: If the status field is anything other than "success", we immediately exit using SystemExit. This prevents the pipeline from proceeding to planning or writing when the initial analysis is unreliable.

The JSON Validation Helper

To ensure consistent schema enforcement, we use the same validation shape as before:

As before, this helper extracts the contract object from the JSONL event stream, then checks for the presence of required fields: status and summary. These fields must exist in every stage's output, so we enforce this schema requirement explicitly. If validation passes, we return the parsed dictionary for the caller to use.

Stage B: Generating the Test Plan

With validated targets from Stage A, we can now design a test plan:

Notice how we embed Stage A's output directly into the prompt using json.dumps. This creates an explicit data dependency: Stage B receives exactly what Stage A produced, formatted as JSON within the prompt. The double curly braces ({{ and }}) are Python's way of escaping braces in f-strings, allowing us to include literal JSON structure in the template.

The subagent must now create a test_plan array, where each entry specifies a file, test name, and testing approach. This structured plan gives Stage C precise instructions for what to implement.

Persisting and Gating Stage B

Just like Stage A, we validate, persist, and gate Stage B's output:

The pattern repeats: extract and validate the contract, check for the stage-specific test_plan field, write to artifacts/stage_b.json, and check the status gate. Only if Stage B succeeds do we proceed to Stage C, where actual file modifications occur. This layered gating ensures that we never attempt writes based on flawed analysis or incomplete planning.

Stage C: Implementing Tests

Now we reach the write stage, where the subagent modifies actual files:

We embed the test_plan from Stage B into this prompt, giving the agent concrete instructions. The constraints are explicit: we use the ALLOWED_PREFIXES constant to dynamically generate the allowed directories statement, and the number of modified files must stay under our MAX_CHANGED_FILES limit.

The comment emphasizes that despite these constraints, we still enforce safety through post-write validation. Trusting the agent's adherence to constraints is not enough; we verify afterward.

Persisting Stage C Output

After Stage C completes, we parse and store its results:

Unlike previous stages, we do not immediately halt on failure here. Stage C might report "partial" success, which is still valuable. The key difference is that modifications may already have occurred, so we need to validate what actually changed rather than just checking the reported status.

Detecting Changed Files

To validate modifications, we first need to know what files changed using a git_changed_files helper:

This function runs git diff --name-only, which lists modified files without showing the actual changes. We split the output into lines and filter out any empty lines using a list comprehension. We also exclude gated_pipeline.py so the pipeline driver file itself does not count against the implementation output during validation.

The result is a clean list of file paths that have uncommitted modifications. This provides the raw data for our validate_changes() checks.

Enforcing File Count Limits

With the list of changed files, we first check the quantity:

If the agent modified more files than allowed, we raise a RuntimeError immediately. This prevents scope creep where an agent gradually expands its changes beyond the intended target. The error message includes both the actual count and the limit, making it clear why validation failed. Catching this early helps us iterate on constraints or agent instructions before changes become too widespread.

Enforcing Directory Restrictions

Next, we verify that all modifications occurred in allowed directories:

We iterate through each modified file, checking whether its path starts with one of our ALLOWED_PREFIXES. The startswith method accepts a tuple, automatically checking against all allowed prefixes. If any file falls outside these boundaries, we raise a descriptive error identifying the problematic path.

This check prevents agents from accidentally (or deliberately) modifying configuration files, build scripts, or other sensitive areas. By defining the perimeter explicitly and enforcing it programmatically, we create a reliable safety net.

Running the Validation

After Stage C completes, we invoke the validation:

This single line triggers all our safety checks: file count limits and directory restrictions. If validation passes, the pipeline continues. If it fails, a RuntimeError is raised, halting execution and preventing the invalid changes from being committed or deployed. This pattern makes it easy to add additional validation rules in the future; just extend validate_changes with more checks.

Capturing the Git Diff

Finally, we preserve a complete record of what changed:

The git diff command (without --name-only) produces a full patch showing line-by-line changes. We capture this output and write it to artifacts/git_diff.patch, creating a human-readable audit trail. Developers can review this patch to understand exactly what the agent modified, making it easier to approve, adjust, or reject the changes.

The Complete Safety Framework

Our pipeline combines multiple safety mechanisms to control subagent behavior. Sequential gating prevents progression unless each stage succeeds. Analysis and planning phases ensure understanding happens before modifications. Explicit constraints in prompts communicate boundaries to the agent. Post-write validation verifies that actual changes conform to rules. Git-based auditing creates a permanent record of modifications.

Together, these layers create defense in depth. Even if an agent misunderstands a constraint or behaves unexpectedly, the validation catches violations before they cause harm. The artifact outputs provide transparency, making it easy to debug failures or review the agent's decision-making process.

Conclusion and Next Steps

We have built a sophisticated multi-stage pipeline that safely coordinates analysis, planning, and controlled writes. This pattern scales to more complex workflows: you can add additional stages, chain multiple pipelines together, or integrate external validation tools. The key principles remain constant: explicit boundaries, sequential gating, and post-write validation.

The architecture explored here represents a significant evolution from simple, isolated subagents. By combining the isolation from Lesson 1, the orchestration from Lesson 2, and the gating logic from this lesson, we can build powerful automation that remains predictable and safe. Each stage produces artifacts, creating full traceability from initial analysis to final implementation.

Now, it is time to apply these concepts hands-on! The practice exercises ahead will challenge you to build your own multi-stage pipelines, experiment with different validation strategies, and handle edge cases where stages fail or produce unexpected results.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal