Introduction

Welcome back to Mastering Advanced AI Tooling in Codex. In the previous unit, we explored the two primary CLI execution modes: single shot commands for quick advice and exec mode for autonomous action.

Those tools are great for interactive work, but now we’ll take the next step: programmatically driving the same “run → validate → react” loop using the Python SDK, so the workflow becomes durable automation you can run in CI, cron jobs, or internal tooling.

Why Not Just Use "codex exec"?

If codex exec already automates execution, why write Python scripts at all?

The CLI is a convenience wrapper around agent execution—optimized for human interaction in a terminal. However, it offers limited composability, opaque prompt packaging, and no native integration with your existing Python tooling.

By contrast, the SDK provides:

  • Pipeline embedding: Call Codex from within CI/CD jobs, test runners, or data workflows
  • Structured integration: Parse outputs, chain calls, and combine results with other Python logic
  • Prompt control: Explicitly engineer instructions and contexts rather than relying on CLI defaults
  • Composability: Mix model intelligence with file I/O, API calls, and custom validation logic

When you need repeatable automation—not just an interactive assistant—the SDK is the right abstraction.

From Interactive to Automated: A Programmable "exec" Loop

Manual CLI interactions require your presence, and the context of each session disappears when the terminal closes. That’s fine for ad-hoc help, but it breaks down when you need a repeatable execution contract.

In this lesson, we’ll implement a lightweight programmable version of exec:

  1. Run real commands (tests/build/lint)
  2. Check exit codes
  3. Only consult the model when something fails
  4. Persist a fix plan as an artifact

This pattern is durable, inspectable, and easy to wire into existing automation.

Foundation and Imports

We’ll use:

  • subprocess to run commands and capture output
  • os to create artifact directories
  • OpenAI from the official SDK

Set this environment variable:

  • OPENAI_API_KEY (the SDK reads it automatically)

We’ll define the model name directly in code to keep execution consistent across environments.

Capturing Test Failures

Here’s a minimal test runner function. The exact command depends on your repo, but this example shows a common pattern for Yarn-based setups and explicitly disables watch mode.

Important detail: don’t mix runner-specific flags (for example, --runInBand is a Jest flag and won’t apply to other runners like Vitest).

Consulting Codex for Intelligence (Responses API)

Next, we consult the model only if tests fail. Many current “Codex-optimized” model variants are served via the Responses API, so we use client.responses.create(...).

We also:

  • Truncate output to reduce context-limit risk
  • Extract returned text robustly (Responses can return structured output)
  • Wrap the call in error handling so automation doesn’t crash if the service is unavailable
Persisting the Results

When automation runs in CI/CD, nothing persists after the job completes unless you explicitly save it. Printing to stdout might help during debugging, but once the pipeline finishes, that output is buried in logs—hard to search, hard to share, and impossible to feed into downstream steps.

By writing the model's fix plan to a file artifact, you make it:

  • Inspectable: Developers can download and read it without scrolling through thousands of log lines
  • Portable: Attach it to pull request comments, email it to on-call engineers, or feed it to another automation stage
  • Traceable: Each run produces a dated, versioned record of what the model suggested—critical for auditing when fixes are applied
  • Actionable in pipelines: CI systems (GitHub Actions, GitLab CI, Jenkins) have native support for uploading artifacts. You can configure the pipeline to fail the job, attach fix_plan.md, and notify relevant teams—all without manual intervention.

In a typical CI workflow:

  1. Tests fail
  2. This script generates artifacts/fix_plan.md
  3. The CI job uploads the artifact (for example, actions/upload-artifact in GitHub Actions)
  4. Developers review the plan directly from the UI, or another job uses it to auto-generate a draft PR

This pattern transforms ephemeral model output into a durable, actionable asset.

Conclusion and Next Steps

You now have a minimal, durable automation pattern that mirrors the core behavior of exec, but in a form you can embed anywhere: run real commands, check exit codes, consult the model only on failure, and write a persistent artifact for review.

In the practice section, you’ll implement and extend this loop—tightening prompts, improving truncation/formatting, and wiring the output into your workflow (for example, generating structured plans, tagging owners, or attaching artifacts in CI).

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal