Introduction

Welcome to Mastering Advanced AI Tooling in Codex! This is the first lesson of the course. Throughout the course, we'll use Codex as a durable agent to help plan development work and automate workflows reliably.

In this lesson, we'll focus on the Codex CLI's execution modes—especially what changes when you run a prompt normally versus running it with exec, and why that difference matters for automation.

Understanding the Two CLI Modes

Before we dive into commands, let's establish a mental model.

Default mode: help and exploration

When you run a plain codex prompt, Codex can often behave like an agent: it may inspect your repository (for example, by listing directories, searching for filenames, or reading config files) to produce a better answer.

However, this mode does not provide a strict automation contract. You typically get:

  • A helpful response (often based on a mix of model knowledge and observed repo context)
  • No guarantee that any suggested results (like "tests pass") were actually executed
  • No reliable task succeeded or failed signal for automation pipelines

Think of this as: "help me understand, plan, summarize, or suggest"—possibly with some repo exploration.

Exec mode: execution and verification

When you run codex exec, you're invoking Codex as an execution engine. It must perform real actions (running commands, editing files when requested), observe actual outputs, and finish with an exit code that represents the task outcome.

Think of this as: do it for real, and prove it with the environment.

Default Mode: Helpful Answers Without Automation Guarantees

Suppose we have failing tests in our JavaScript project, and we want Codex to analyze them and suggest a fix:

Note: --yolo stands for "You Only Look Once." It is a safety bypass flag that tells Codex to execute commands immediately without stopping to ask you for permission. This is useful for fast iteration during development or when running Codex in automated scripts.

What happens behind the scenes (in default mode) typically looks like this:

  • Codex receives your prompt.
  • It may explore the repository (search for tests, skim files, read configs) to improve its answer.
  • It generates a response that appears in your terminal as text.

The output might look something like this:

Key point: even if Codex explored your repo to form this answer, default mode still does not guarantee that:

  • tests were actually executed,
  • the failure summary corresponds to a real test run,
  • the suggested fix was validated.

You should treat default mode as high-quality assistance, not a CI-grade signal. It's excellent for learning, exploration, getting explanations, brainstorming approaches, or receiving code snippets that you will review and apply yourself.

Exec Mode: Real Execution and Verification

Now let's explore the second interaction style: exec mode. When we use codex exec, Codex must actually execute commands, observe real results, and verify outcomes against the real state of your repository.

Here's how we use exec mode to have Codex run tests and report failures:

Notice the difference: we use exec, which signals that we want execution semantics (real commands, real outputs, real exit codes) rather than just a helpful answer.

When we run this command, several things happen:

  • Codex interprets our request and formulates a plan using actual available commands.
  • It executes the test command in your real environment (e.g., yarn test or yarn jest depending on the repo).
  • It observes the actual exit code and output streams.
  • It processes real results and formats them according to our request.

The output might include actual test execution logs:

Unlike a best-effort narrative, exec mode is accountable to the environment. If tests don't exist, the command fails. If the environment is misconfigured, it fails. If tests fail, it returns a non-zero exit code. This makes exec mode ideal for running yarn test in pipelines, running yarn build to ensure compilation after refactoring, applying changes and verifying them with real commands, or any workflow where "probably correct" isn't acceptable.

Because exec can cause real side effects, use it in a workflow where changes can be reviewed and reverted (feature branches, clean working trees, etc.). --yolo removes approval prompts, so it's powerful—but also removes a safety check.

The Real Difference Between Assistance and Automation

A common misconception is that codex and codex exec are just different prompt styles—like asking "what would you do?" versus "do it now."

In reality, the distinction is architectural:

  • Default codex mode can be agentic and can explore context, but it is not designed to be a strict automation boundary. It's optimized for producing a useful response—even if the response includes plans, summaries, and suggestions.
  • codex exec provides the semantics you need for automation: the task is run against reality, with real side effects and verifiable success or failure via exit codes.

Another way to say it:

  • Default mode is assistance (possibly informed by exploration).
  • Exec mode is automation-grade execution.
The Task Completion Contract

Exec mode establishes a task completion contract:

  • Commands are actually executed and validated against the filesystem.
  • The CLI returns meaningful exit codes (e.g., non-zero if tests fail).
  • It works in non-interactive contexts (scripts, CI/CD), especially when paired with --yolo.

Note: exec mode does not guarantee that Codex will choose the exact same plan every run (model reasoning can vary). The guarantee is that whatever plan it chooses, it must be executed and measured in the real environment.

Comparing Execution Guarantees and Best-Effort Help
DimensionDefault codex Modecodex exec Mode
PurposeHelp: explanations, summaries, suggestions (may explore repo)Complete tasks through actual execution
Execution GuaranteesNo strict completion contractReal side effects and verification
Automation SuitabilityLimited (not a reliable success/failure boundary)Excellent (scriptable, CI/CD-ready)
Exit BehaviorTypically exits successfully if the request was handled (not a task exit code)Returns task-relevant exit codes (e.g., 1 if tests fail)
Summary and Next Steps

You now have the core CLI model:

  • Default codex: helpful answers (often informed by repo exploration), but not an automation-grade success/failure contract.
  • codex exec: execution, verification, and exit codes—built for automation.

In the next section, we'll put this into practice with hands-on exercises that reinforce when to use each mode and how to think in terms of execution contracts.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal