Subagents are the workhorse of reliable multi-step automation in Codex. Instead of asking one agent to understand everything, decide everything, and change everything, you spawn small, isolated workers that each do one job under tight constraints—and report back in a format you can automatically validate.
This lesson focuses on two foundations:
- Subagent isolation: keep the worker narrowly scoped so it can't roam across the repo or take unintended actions.
- Output contracts: make the worker's response predictable (ideally strict JSON) so you can safely chain steps together.
A subagent is an isolated Codex run created for a single task. The isolation is the point: the subagent is easiest to reason about when it's given a small "world" and a clear objective.
In practice, a good subagent has:
- A scope ("only analyze this directory")
- A task ("identify missing unit tests")
- A policy ("don't apply changes automatically")
- A contract ("return strict JSON with these fields")
When you keep those explicit, you get runs that are easier to audit, debug, and compose into pipelines.
If you ask an agent to "improve test coverage," it might:
- read far outside the area you intended (because it wants more context),
- make large refactors (because it notices style issues),
- write code when you only wanted a report.
Isolation doesn't magically prevent mistakes, but it limits the blast radius and makes behavior easier to validate. You can treat each subagent like a bounded operation: "it looked only here," "it returned only this shape," "it changed nothing."
That last part is where output contracts become your enforcement mechanism.
Subagents are most useful when their output can be consumed by something else—another step, a script, a CI job, or a human who wants a clean checklist.
Free-form prose is pleasant, but brittle for automation. An output contract is a strict agreement that the subagent will output data in a known structure—typically JSON.
The rule of thumb is simple: if you plan to chain this result into anything, don't accept "helpful explanation." Accept parseable output.
That's why the contract you'll use in this lesson insists on:
- STRICT JSON
- no markdown
- no prose
- exactly one JSON object
Start with the safety baseline:
The -a flag controls approval prompts (untrusted | on-request | never). With -a never, Codex does not stop to ask for approval. This means if the sandbox mode allows writes, Codex may write files without pausing.
Important: -a never controls whether Codex pauses for approval—it does not control whether Codex has write capability. Write capability comes from the sandbox mode (controlled by -s).
Sandbox mode defines capability boundaries:
The -s flag sets the sandbox mode. workspace-write means the subagent has the capability to modify files. With -a never, you're saying Codex won't pause for approval—so if it decides to write, it will write immediately.
This separation matters because later you might reuse the same task framing but rely on your prompt's "analysis only" instruction—or you might switch to a stricter sandbox mode when you want guarantees.
If you need strict read-only behavior, set -s read-only explicitly. If you do use -s workspace-write, treat the run as "write-capable," and verify your expectations via your contract plus post-run validation (e.g., checking a diff / changed files).
Here's the missing piece you called out—the full, concrete subagent command. This is the pattern you'll reuse constantly:
Read it as three blocks inside exec:
-
Scope constraint
Analyze ONLY files under packages/utils/.
This is your "fence." Put it first, and keep it blunt. -
Task
Identify utility functions missing unit tests.
One job, stated plainly. -
Output contract
The JSON schema tells the subagent exactly what to emit—no extra commentary.
Understanding Terminal Output
When runningexecin a terminal, you might see the JSON output appear twice. This is expected behavior. The CLI first shows the Raw Output from the model, followed by Metadata (tokens used), and finally the Return Value. The "Return Value" is the CLI confirming it successfully extracted the valid JSON payload from the response, ensuring it is ready to be passed to the next agent or script.
The uppercase ONLY is not decoration—it's a reliability tool. Models tend to broaden scope when they feel uncertain ("let me check one more file…"). Clear constraints reduce that drift.
If you want to make the isolation even more robust, you can add a clause like:
- "If you need files outside scope, set
statustopartialand explain insummary."
That gives the model a safe way to admit it's blocked without breaking the fence.
Your contract:
status
A compact decision signal for the supervisor:
success: done, within constraintspartial: did some work, but hit a limitation (scope restriction, missing info, ambiguity)failed: couldn't meaningfully complete the task
summary
A short human-readable line that explains what happened. Keep it a string so it remains machine-friendly.
findings
Your payload: a list of utilities missing unit tests. In practice, each entry should be specific enough to act on (function name + file path is a good minimum).
files_read
This is your audit trail. It's also how you verify that "Analyze ONLY …" was respected.
files_modified
This is a safety check and a consistency mechanism. For an analysis-style run, files_modified is expected to be an empty array ([]). However, this is not a guarantee derived from -a never alone—especially if the subagent is running with -s workspace-write. If you need strict "no writes," use and/or validate post-run (e.g., reject results when is non-empty or when your workspace diff indicates changes).
When you run the command, the subagent should:
- read only under
packages/utils/ - identify utility functions that appear to lack unit tests
- return exactly one JSON object matching the contract
- report every file it read
- not apply changes in this analysis step (it should return files_modified: []), and validate this expectation post-run if the subagent is write-capable.
If it can't confidently map "utility functions" to "tests" due to missing conventions or missing files within scope, the correct behavior is not to improvise across the repo. The correct behavior is status: "partial" with an honest summary.
Subagents become reliable when you treat them less like chat and more like small, isolated programs:
- Isolation comes from explicit scope constraints + controlled autonomy.
- Output contracts turn results into a stable interface you can parse, validate, and chain.
In the next practices, you'll write a few variations of this same pattern—changing scope, changing tasks, and tightening contracts—so you can build multi-step workflows where each subagent is predictable and safe to compose.
