Building the Engineer Loop

Introduction

Welcome back to Mastering Advanced AI Tooling in Codex. You've mastered capturing test failures and consulting the model for fix suggestions. Now we'll take the next step: building a complete engineer loop that not only plans changes but actually applies them, verifies the results, and documents the outcome.

This transforms Codex from an advisor into an autonomous agent that can execute complete fix cycles suitable for continuous integration systems.

The Engineer Loop Architecture

A production-grade automation loop follows five distinct stages:

Collect Context - Gather repository state and test results
Generate Structured Patch - Get machine-readable fixes from the model
Apply Patch Safely - Modify source code using validated diffs
Verify with Tests - Confirm the fix actually works
Generate Report - Document everything for audit and review

Each stage produces artifacts stored in a timestamped directory, creating a complete audit trail. Let's build this step by step.

Collecting Repository Context

Effective automation requires precise understanding of the current state. We need two pieces of information: uncommitted changes already in the working directory and the output of failing tests.

By combining git diff with test execution, we give the model complete visibility into what's already modified versus what needs fixing. This context enables more precise patch generation.

Generating Structured Patches

The key insight for automation is requesting structured outputs instead of freeform text. We instruct the model to return only valid JSON with exactly two fields—plan and diff—so the response is machine-readable and deterministic to parse. We then validate the response by running json.loads(), with a small best-effort recovery if the model accidentally wraps the JSON in extra text.

There's one more practical concern: the model has no way to know the exact layout of your repo. If it guesses the wrong path in the diff headers (e.g. packages/math/point.ts instead of packages/math/src/point.ts), git apply will reject the patch with No such file or directory. To avoid this, we read the source file from disk and inline it into the prompt, and we tell the model exactly which path to use in the diff headers.

Understanding Unified Diff Format

Before we apply patches, it's critical to understand the format git expects. A unified diff shows changes between two versions of a file. Here's what one looks like:

The format breaks down as:

Header (diff --git...): Identifies which file changed
Index line: Git metadata about the change
File markers (--- old, +++ new): Show before/after filenames
Hunk header (@@ -12,7 +12,7 @@): Line numbers where changes occur
Change lines: Lines starting with - are removed, + are added, (space) are context

This format is what git apply expects. The model generates this automatically when we request a "unified diff format" in our instructions. The beauty of this approach is that git handles all the actual file modification logic—we just validate and execute.

Applying Patches with Precision

Once we have a diff, we use git apply to modify the actual source files. The critical safety measure is validating the patch before applying it to avoid corrupting the working directory.

In a real repo setup (like excalidraw/), it's also important to:

Save the patch into the artifacts directory using an absolute path
Ensure the artifacts directory exists
Run git from the repository root (not from wherever the script happens to be invoked)

LLM-generated diffs are also frequently cosmetically imperfect — miscounted hunk headers (@@ -12,7 +12,7 @@ when the hunk is really 8 lines) and stray whitespace are extremely common, and vanilla git apply rejects both with corrupt patch at line N. We pass two extra flags to make git apply tolerant of these defects without sacrificing safety:

--recount auto-corrects mismatched hunk line counts
--whitespace=fix auto-corrects whitespace drift

Git still requires context lines to match the real file content, so the patch can't silently apply somewhere it shouldn't — these flags just stop punishing the model for cosmetic mistakes that don't change the semantic patch.

Verifying Results with Tests

After applying the patch, we rerun the test suite to verify the fix worked. This is the critical validation step that closes the automation loop.

The exit code provides an objective measure of success. Zero means the fix worked; non-zero means we need to try again or escalate to human review.

Handling Failure Scenarios

In this lesson, we focus on the success path where patches apply cleanly and tests pass. However, in production systems you'll encounter scenarios where:

Tests still fail after the patch is applied
The patch breaks other tests that were previously passing
The patch conflicts with uncommitted changes in the working directory

For these situations, you'd add rollback logic using git stash or work on isolated feature branches before merging to main. You might also implement retry logic with different prompts or escalate to human review after N failed attempts.

Unit 4 will cover these safety and recovery patterns in depth. For now, understanding the deterministic success path gives you the foundation to build robust error handling later.

Generating Comprehensive Reports

The final stage documents everything: what was attempted, what changed, and whether it succeeded. This creates the audit trail required for production automation.

By capturing both the planned diff and the final repository state, we create complete visibility into what the automation attempted and achieved.

Orchestrating the Complete Pipeline

Now we connect all stages into a single execution flow. Each stage feeds into the next, creating the deterministic loop that makes automation reliable.

Example Output

When you run this complete pipeline, you'll see output like:

Understanding Determinism in Automation

The key to reliable automation is determinism: given the same inputs, the pipeline produces consistent, predictable results. We achieve this through:

Structured outputs (structured JSON prompting and robust parsing eliminate ambiguity)
Validated patches (two-step git apply prevents partial modifications)
Exit codes (objective pass/fail signals)
Timestamped artifacts (isolated storage prevents run-to-run interference)

This design makes the loop suitable for CI/CD systems where failures must be reproducible and outcomes must be verifiable.

Conclusion and Next Steps

You've now built a complete engineer loop that autonomously:

Collects repository context (git state + test failures)
Generates structured patches using structured JSON prompting and robust parsing
Applies changes safely using validated git operations
Verifies fixes by rerunning tests
Persists comprehensive reports in timestamped directories

This is the foundation of autonomous development agents. In the practice section, you'll implement this pipeline yourself in the excalidraw repository, gaining hands-on experience with deterministic AI-driven workflows.

Previous Lesson

Next Lesson: Safe Automation Patterns

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal