Introduction

Welcome back to Codex Subagents & Multi-Agent Orchestration! You have completed the first lesson where we explored subagent isolation and output contracts. Those foundational concepts enable us to run reliable, contained AI operations with predictable results.

In this second lesson, we are scaling up: Parallel Subagents + Deterministic Aggregation. Instead of running a single isolated subagent, we will orchestrate multiple subagents working concurrently on different parts of a codebase. Each subagent will analyze a specific scope, and we will combine their results into a unified report. By the end of this lesson, you will understand how to build supervisor scripts that coordinate parallel AI work while maintaining full observability and control.

Why Parallel Execution Matters

When analyzing a large codebase, running subagents sequentially can be time-consuming. If each subagent takes 30 seconds, analyzing three different packages would require 90 seconds of sequential execution. With parallel execution, all three run simultaneously, reducing total time to roughly 30 seconds.

Beyond speed, parallelism offers other benefits:

  • Scope isolation: Each subagent stays focused on its assigned area without contamination from other scopes.
  • Independent failures: If one subagent fails, others can still succeed and return useful results.
  • Resource utilization: Modern systems have multiple cores; parallel execution leverages this capacity.

The challenge lies in coordination: launching multiple processes, collecting their outputs, and merging results into a coherent whole. This is where deterministic aggregation becomes essential.

The Architecture

Our approach involves three key components working together. First, a runner function executes individual subagents, capturing their output and metadata. Second, a parallel executor launches multiple runners concurrently and collects results as they complete. Third, an aggregator function combines these individual results into a unified report.

This architecture maintains the isolation principles from our previous lesson while adding coordination. Each subagent still operates within strict boundaries with a JSON contract, but now we are managing multiple subagents as a cohesive workflow. The supervisor script becomes the orchestrator, deciding what tasks to run, monitoring their execution, and synthesizing their findings.

Running a Single Subagent

Let us start by examining how we execute a single subagent programmatically:

The run_agent function takes two parameters: a scope (directory path) and a task (what to analyze). It constructs a prompt dynamically, embedding these values into the instruction template. Notice the familiar contract from our previous lesson; we are enforcing the same strict JSON output format. The .strip() method removes leading and trailing whitespace, ensuring clean input to the codex command.

Executing the Subagent Process

With our prompt ready, we launch the subagent using Python's subprocess module:

This runs the subagent in non-interactive mode and emits structured CLI events as JSONL on stdout. The capture_output=True argument captures both stdout and stderr, while text=True returns string output. The completed process object proc includes returncode, stdout, and stderr for supervisor logic.

Capturing Timing Metrics

Before launching the subagent, we record the start time; after completion, we calculate runtime_ms:

This simple pattern provides crucial observability. By measuring how long each subagent takes, we can identify performance bottlenecks, compare different task types, and track improvements over time. The conversion to milliseconds (* 1000) and integer casting give us precise, easy-to-compare metrics. We will attach this runtime_ms value to each subagent's result for later analysis.

Extracting the JSON Contract from CLI Events

Because --json returns event lines (not a single JSON object), the supervisor should extract the contract payload from the completed agent message event. A reusable helper keeps this logic consistent across tasks:

This helper parses each JSONL event, selects the agent_message, and then parses item.text into the final contract object. If the expected payload is missing, it raises a clear error that the caller can handle.

Handling Invalid Output

After the subprocess completes, capture process metadata first, then parse with the helper:

Then parse the contract. If parsing fails, return a standardized failure payload with diagnostics:

This pattern ensures one failing subagent does not crash the entire orchestration, and it produces a failure record that can be inspected later without re-running.

Enriching the Result

When contract parsing succeeds, enrich the subagent's output with supervisor-level metadata:

Sometimes a process can exit with a non-zero code while still producing parseable contract JSON. Recording that as a reason makes those cases visible in the final artifact:

Launching Parallel Agents

Now we reach the core of parallel execution. We define our jobs and use a thread pool to run them concurrently:

The jobs list defines three distinct tasks, each analyzing a different directory with a specific objective. The ThreadPoolExecutor with max_workers=3 creates a pool capable of running three subagents simultaneously. For each job, pool.submit schedules the run_agent function to execute with the given scope and task, returning a Future object that will eventually contain the result.

Collecting Results

As subagents finish, collect results in completion order:

The as_completed iterator yields futures as they finish, regardless of submission order. This means faster subagents contribute their results immediately rather than waiting for slower ones. The fut.result() call retrieves the actual return value from run_agent, which is our enriched dictionary containing status, findings, and metadata. We accumulate these in the results list for aggregation.

Aggregation Strategy

With all results collected, aggregation builds a deterministic, traceable report. Start with a consistent skeleton:

Process each result using a simple status lattice. Failed results are recorded in the failures list, while partial results still contribute summaries and findings but downgrade the overall status:

Finally, compute the overall status deterministically: if everything failed, mark the run as "failed"; otherwise, any failed or partial result downgrades the run to "partial"; only all-success results keep the overall status at "success":

This preserves useful output from partial runs while ensuring the top-level status accurately reflects degraded subagent outcomes.

Writing the Output

Finally, we serialize the aggregated results to a JSON file:

The Path("artifacts").mkdir(exist_ok=True) ensures our output directory exists without failing if it is already there. We then write the aggregated data with indent=2 for human readability and specify utf-8 encoding for consistent text handling. This creates a persistent artifact that can be consumed by other tools, archived for auditing, or inspected manually.

Putting It All Together

Our complete system orchestrates three parallel subagents analyzing different packages: packages/math/ for test coverage gaps, packages/element/ for performance hotspots, and excalidraw-app/ for error handling improvements. Each subagent runs independently within its scope, returns structured JSON through CLI events, and contributes to a combined report.

The workflow executes in four phases: job definition, parallel execution, result collection, and deterministic aggregation. Because each phase is cleanly separated, we can easily adjust the number of workers, add new jobs, or modify the aggregation logic without touching the core subagent runner. This modularity makes the system maintainable and extensible.

Conclusion and Next Steps

We have built a complete parallel subagent orchestration system with deterministic aggregation. The pattern we explored scales from three subagents to dozens; the same architecture handles larger workloads with minimal changes. You now understand how to launch concurrent AI tasks, collect their outputs, handle failures gracefully, and synthesize results into actionable reports.

This capability becomes particularly powerful when combined with the isolation principles from our previous lesson. Each subagent operates safely within its boundaries, yet together they provide comprehensive coverage of a large codebase. The aggregated output gives you a unified view while preserving traceability back to individual sources.

Time to make this knowledge stick! In the upcoming practice section, you will build your own parallel orchestration workflows, experimenting with different task combinations, aggregation strategies, and error handling approaches.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal