Introduction: The Context Problem in Long Sessions

When you build a production API feature with 10 tasks, you face a critical choice: execute all tasks in one long session, or break them into independent units with fresh context each time.

In a traditional approach, you might give Codex CLI a complete specification and say "implement all 10 tasks." This seems efficient—one instruction, one session, done. However, there's a hidden risk: as the session progresses through tasks, the probability of forgetting specification details increases. By task 10, important validation rules or patterns from the beginning might be forgotten.

This is both a probability problem and a technical limitation. Ten tasks mean ten opportunities to miss something. Even if each task has a 95% chance of being perfect, by task 10 the cumulative probability of at least one error is significant. Additionally, as the session progresses, you may hit context window limits, forcing earlier information to be dropped. By task 10, important validation rules or patterns from the beginning might be forgotten due to both accumulated conversation length and statistical likelihood.

With agent orchestration, you solve this automatically. Instead of one long session, a Main Agent coordinates the work. For each task, it delegates to a fresh Subagent that starts with a clean slate—AGENTS.md is auto-loaded, specification files are loaded when you reference them with @, and there's no accumulated context from previous tasks.

How Orchestration Works: The Main Agent Pattern

The orchestration pattern has two roles:

Main Agent (Coordinator):

  • Reads the task list from tasks.md
  • For each task, spawns a specialized subagent
  • Receives completion reports
  • Waits for human approval
  • Tracks progress
  • Stops at phase boundaries for review

Subagent (Executor):

  • Receives one specific task instruction
  • Gets fresh context: AGENTS.md auto-loaded, specification files you reference with @, task instructions
  • Implements the task following test-first workflow
  • Runs self-validation
  • Reports structured results

Spawning Subagents

To delegate work to a subagent, we ask the Main Agent to spawn a sub-agent with a specific role and instruction:

For example:

This tells the Main Agent to spawn a sub-agent using the task-executor role defined in your .codex/config.toml under [agents.task-executor].

Here's what a typical orchestration flow looks like:

The key insight: each subagent starts fresh. There's no accumulated context decay.

What Subagents Automatically Receive

When you spawn a subagent, Codex CLI automatically provides:

  1. AGENTS.md: Your project instructions are auto-loaded. The subagent knows your coding standards, patterns, and conventions without you having to repeat them.

  2. Files you reference: Any file you reference with @ in your instruction is loaded into the subagent's context. For example: @specs/comments/specification.md or @src/models/task.py.

  3. Task instructions: The specific instruction you provide when spawning the agent.

What subagents do NOT get:

  • Previous agent outputs (unless you explicitly include them)
  • Accumulated conversation history from the main session
  • Files not explicitly referenced with @

This design is intentional—it prevents context pollution and ensures each task gets exactly the context it needs, no more, no less.

Creating Agent Role Definitions

Before you can spawn a subagent with a specific role, you need to define it. Agent roles are configured in your .codex/config.toml and their detailed settings live in separate TOML files under .codex/agents/.

First, register the role in your .codex/config.toml:

Field meanings:

  • description: When the agent should be used—Codex reads this when deciding which role to spawn.
  • config_file: Path to a TOML config layer applied when Codex spawns an agent with this role. Relative paths resolve from the config.toml that defines the role.

Then, create the role-specific config file at .codex/agents/task-executor.toml:

Config field meanings:

  • model: AI model to use (e.g., gpt-5.3-codex, gpt-5.3-codex-spark, gpt-5.4).
Structured Completion Reports

When a subagent finishes its task, it doesn't just say "done." It provides a structured completion report that makes human review trivial:

This report tells you immediately:

  • Did the tests pass?
  • Is coverage adequate?
  • Are types clean?
  • What files changed?
  • What commit message to use?

With this information, your review takes 3 minutes instead of needing to investigate what happened.

The Three-Level Validation Strategy

Professional orchestration uses strategic checkpoints instead of reviewing every detail of every task:

Level 1 - Task Completion (3 minutes per task): After each agent completes, do a quick review:

  • Read the completion report
  • Spot-check one thing (e.g., test names make sense)
  • Approve or request fix

Level 2 - Phase Checkpoint (10-15 minutes per phase): After completing a logical phase (e.g., "Foundation" with 5 tasks), stop for deeper validation:

  • Run integration tests across all phase tasks
  • Verify patterns are consistent
  • Confirm phase goal is achieved
  • Tag with git tag feature-phase-1-complete

Level 3 - Feature Complete (30-45 minutes): After all tasks are done, do comprehensive validation:

  • Verify all acceptance criteria
  • Security review
  • Performance testing
  • Documentation check

For a 10-task feature:

  • Level 1: 10 × 3min = 30 minutes
  • Level 2: 2 × 15min = 30 minutes
  • Level 3: 45 minutes
  • Total: ~1.75 hours of validation

This is predictable and efficient compared to ad-hoc review.

Architectural Advantages Beyond Context

While preventing context decay is the primary benefit, orchestration provides additional advantages:

Modularity: Each task is an independent unit with clear inputs and outputs.

Testability: You can validate each task separately before moving to the next.

Maintainability: Clear boundaries between tasks make the codebase easier to understand.

Scalability: You can add more agents for parallel work (different features can run simultaneously).

Consistency: The same process applies to every task—no variation in workflow.

Reproducibility: Given the same specification and tasks, you get predictable results.

Summary

In this lesson, you learned:

  • Context decay is a probability problem in long sessions executing many tasks
  • Agent orchestration solves this with fresh subagents per task
  • Spawning subagents with specific roles delegates work to defined agent configurations
  • Subagents automatically get AGENTS.md, referenced files, and task instructions
  • Agent roles use TOML configuration to define identity and capabilities
  • Structured completion reports make validation efficient
  • Three-level validation (task, phase, feature) provides systematic quality control

In the upcoming tasks, you'll experience this firsthand by executing the same feature both ways and comparing the results.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal