Orchestrating Task Execution

Introduction: The Context Problem in Long Sessions

When you build a production API feature with 10 tasks, you face a critical choice: execute all tasks in one long session, or break them into independent units with fresh context each time. In a traditional approach, you might give Codex CLI a complete specification and say "implement all 10 tasks." This seems efficient—one instruction, one session, done. However, there's a hidden risk: as the session progresses through tasks, the probability of forgetting specification details increases. By task 10, important validation rules or patterns from the beginning might be forgotten. This is both a probability problem and a technical limitation . Ten tasks mean ten opportunities to miss something. Even if each task has a 95% chance of being perfect, by task 10 the cumulative probability of at least one error is significant. Additionally, as the session progresses, you may hit context window limits, forcing earlier information to be dropped. By task 10, important validation rules or patterns from the beginning might be forgotten due to both accumulated conversation length and statistical likelihood. With agent orchestration, you solve this automatically. Instead of one long session, a Main Agent coordinates the work. For each task, it delegates to a fresh Subagent that starts with a clean slate—AGENTS.md is auto-loaded, specification files are loaded when you reference them with @, and there's no accumulated context from previous tasks.

How Orchestration Works: The Main Agent Pattern

What Subagents Automatically Receive

When you spawn a subagent, Codex CLI automatically provides: AGENTS.md : Your project instructions are auto-loaded. The subagent knows your coding standards, patterns, and conventions without you having to repeat them. Files you reference : Any file you reference with @ in your instruction is loaded into the subagent's context. For example: @specs/comments/specification.md or @src/models/task.py . Task instructions : The specific instruction you provide when spawning the agent. What subagents do NOT get: Previous agent outputs (unless you explicitly include them) Accumulated conversation history from the main session Files not explicitly referenced with @ This design is intentional—it prevents context pollution and ensures each task gets exactly the context it needs, no more, no less.

Creating Agent Role Definitions

Structured Completion Reports

When a subagent finishes its task, it doesn't just say "done." It provides a structured completion report that makes human review trivial: text T001 complete. Validation: ✓ Tests: 5 passed ✓ Coverage: 94% ✓ Types: clean ✓ Acceptance criteria met Files modified: - src/models/comment.py - tests/unit/test_comment_model.py Ready for: git commit -m "feat(comments): Add Comment model (T001)" T001 complete. Validation: ✓ Tests: 5 passed ✓ Coverage: 94% ✓ Types: clean ✓ Acceptance criteria met Files modified: - src/models/comment.py - tests/unit/test_comment_model.py Ready for: git commit -m "feat(comments): Add Comment model (T001)" This report tells you immediately: Did the tests pass? Is coverage adequate? Are types clean? What files changed? What commit message to use? With this information, your review takes 3 minutes instead of needing to investigate what happened.

The Three-Level Validation Strategy

Professional orchestration uses strategic checkpoints instead of reviewing every detail of every task: Level 1 - Task Completion (3 minutes per task): After each agent completes, do a quick review: Read the completion report Spot-check one thing (e.g., test names make sense) Approve or request fix Level 2 - Phase Checkpoint (10-15 minutes per phase): After completing a logical phase (e.g., "Foundation" with 5 tasks), stop for deeper validation: Run integration tests across all phase tasks Verify patterns are consistent Confirm phase goal is achieved Tag with git tag feature-phase-1-complete Level 3 - Feature Complete (30-45 minutes): After all tasks are done, do comprehensive validation: Verify all acceptance criteria Security review Performance testing Documentation check For a 10-task feature: Level 1: 10 \times 3min = 30 minutes Level 2: 2 \times 15min = 30 minutes Level 3: 45 minutes Total: ~1.75 hours of validation This is predictable and efficient compared to ad-hoc review.

Architectural Advantages Beyond Context

While preventing context decay is the primary benefit, orchestration provides additional advantages: Modularity: Each task is an independent unit with clear inputs and outputs. Testability: You can validate each task separately before moving to the next. Maintainability: Clear boundaries between tasks make the codebase easier to understand. Scalability: You can add more agents for parallel work (different features can run simultaneously). Consistency: The same process applies to every task—no variation in workflow. Reproducibility: Given the same specification and tasks, you get predictable results.

Summary

In this lesson, you learned: Context decay is a probability problem in long sessions executing many tasks Agent orchestration solves this with fresh subagents per task Spawning subagents with specific roles delegates work to defined agent configurations Subagents automatically get AGENTS.md, referenced files, and task instructions Agent roles use TOML configuration to define identity and capabilities Structured completion reports make validation efficient Three-level validation (task, phase, feature) provides systematic quality control In the upcoming tasks, you'll experience this firsthand by executing the same feature both ways and comparing the results.