Welcome back to Mastering Advanced AI Tooling in Codex. In the previous unit, we explored the two primary CLI execution modes: single shot commands for quick advice and exec mode for autonomous action.
Those tools are great for interactive work, but now we’ll take the next step: programmatically driving the same “run → validate → react” loop using the Python SDK, so the workflow becomes durable automation you can run in CI, cron jobs, or internal tooling.
If codex exec already automates execution, why write Python scripts at all?
The CLI is a convenience wrapper around agent execution—optimized for human interaction in a terminal. However, it offers limited composability, opaque prompt packaging, and no native integration with your existing Python tooling.
By contrast, the SDK provides:
- Pipeline embedding: Call Codex from within CI/CD jobs, test runners, or data workflows
- Structured integration: Parse outputs, chain calls, and combine results with other Python logic
- Prompt control: Explicitly engineer instructions and contexts rather than relying on CLI defaults
- Composability: Mix model intelligence with file I/O, API calls, and custom validation logic
When you need repeatable automation—not just an interactive assistant—the SDK is the right abstraction.
Manual CLI interactions require your presence, and the context of each session disappears when the terminal closes. That’s fine for ad-hoc help, but it breaks down when you need a repeatable execution contract.
In this lesson, we’ll implement a lightweight programmable version of exec:
- Run real commands (tests/build/lint)
- Check exit codes
- Only consult the model when something fails
- Persist a fix plan as an artifact
This pattern is durable, inspectable, and easy to wire into existing automation.
We’ll use:
subprocessto run commands and capture outputosto create artifact directoriesOpenAIfrom the official SDK
Set this environment variable:
OPENAI_API_KEY(the SDK reads it automatically)
We’ll define the model name directly in code to keep execution consistent across environments.
Here’s a minimal test runner function. The exact command depends on your repo, but this example shows a common pattern for Yarn-based setups and explicitly disables watch mode.
Important detail: don’t mix runner-specific flags (for example, --runInBand is a Jest flag and won’t apply to other runners like Vitest).
Next, we consult the model only if tests fail. Many current “Codex-optimized” model variants are served via the Responses API, so we use client.responses.create(...).
We also:
- Truncate output to reduce context-limit risk
- Extract returned text robustly (Responses can return structured output)
- Wrap the call in error handling so automation doesn’t crash if the service is unavailable
When automation runs in CI/CD, nothing persists after the job completes unless you explicitly save it. Printing to stdout might help during debugging, but once the pipeline finishes, that output is buried in logs—hard to search, hard to share, and impossible to feed into downstream steps.
By writing the model's fix plan to a file artifact, you make it:
- Inspectable: Developers can download and read it without scrolling through thousands of log lines
- Portable: Attach it to pull request comments, email it to on-call engineers, or feed it to another automation stage
- Traceable: Each run produces a dated, versioned record of what the model suggested—critical for auditing when fixes are applied
- Actionable in pipelines: CI systems (GitHub Actions, GitLab CI, Jenkins) have native support for uploading artifacts. You can configure the pipeline to fail the job, attach
fix_plan.md, and notify relevant teams—all without manual intervention.
In a typical CI workflow:
- Tests fail
- This script generates
artifacts/fix_plan.md - The CI job uploads the artifact (for example,
actions/upload-artifactin GitHub Actions) - Developers review the plan directly from the UI, or another job uses it to auto-generate a draft PR
This pattern transforms ephemeral model output into a durable, actionable asset.
You now have a minimal, durable automation pattern that mirrors the core behavior of exec, but in a form you can embed anywhere: run real commands, check exit codes, consult the model only on failure, and write a persistent artifact for review.
In the practice section, you’ll implement and extend this loop—tightening prompts, improving truncation/formatting, and wiring the output into your workflow (for example, generating structured plans, tagging owners, or attaching artifacts in CI).
