Welcome back to Mastering Advanced AI Tooling in Codex! We're now at lesson 3, making steady progress through the course. In our previous lessons, we explored how to configure Codex through config.toml and how to safely enable web search capabilities with appropriate security controls.
Today, we're going to learn something powerful and practical: how to create custom Skills that automate repetitive workflows. Skills let us encode our team's review standards, testing practices, and quality checks directly into Codex, turning it from a general assistant into a specialized tool that understands our project's specific needs. By the end of this lesson, we'll build a complete code-review Skill that performs thorough code reviews with consistent structure and depth.
Codex comes with several built-in slash commands like /review (review your working tree), /model (switch models), and /approvals (manage tool approvals). These are part of Codex itself—you use them but don't define new ones.
When you need custom, reusable workflows that encode team-specific expertise, you create Skills. A Skill is a specialized capability stored in a SKILL.md file that Codex can invoke when relevant or when explicitly called. Think of Skills as recipes for AI interactions: each encapsulates a clear role definition (what expertise Codex should apply), a specific task (what we want accomplished), constraints on behavior (what to focus on or avoid), and an output format (how to present results).
For our code review workflow, we have two options:
Option A (Simpler): Use the built-in /review command and rely on AGENTS.md to define what "good review" means for this project. This works well when your review standards provide context that should apply to all interactions.
Option B (More Control): Create a code-review Skill that implements your exact review rubric as an explicit, invokable workflow. This is better when you want a specialized review process distinct from general Codex behavior.
Skills can be defined in two locations depending on scope:
Repository-level Skills go in .codex/skills/<skill-name>/SKILL.md at the root of the repository. These are team-shared capabilities that capture project-specific workflows: how this codebase should be reviewed, tested, or documented. Every team member working on the repository gets these Skills automatically.
User-level Skills go in ~/.codex/skills/<skill-name>/SKILL.md. These are personal workflow tools that may not apply to everyone: your specific IDE preferences, your individual productivity patterns, or experimental workflows you're testing before proposing to the team.
For our code-review Skill, we'll define it as a repository-level Skill since code review standards should be consistent across the team. This ensures that whether Alice or Bob invokes the Skill, they both apply the same rigor and produce comparable results.
Before writing any prompt text, we need to clearly specify what our code-review Skill should accomplish. A well-designed Skill has three components: clear inputs, defined outputs, and explicit constraints.
For code-review, the input should be flexible: either the user provides branch context explicitly, or Codex automatically uses the current git diff if available. The output needs to be structured and actionable: a concise summary, risk flags with severity, test coverage analysis, documentation impact assessment, and a final recommendation. The constraints ensure we stay focused: only review changed code, avoid subjective style comments unless they affect correctness, and prioritize high-impact issues over minor improvements.
This upfront design prevents scope creep. Without clear boundaries, a review Skill might drift into refactoring entire files, rewriting tests, or bikeshedding variable names instead of focusing on the actual changes and their risks.
We create the Skill directory structure and file:
Then create .codex/skills/code-review/SKILL.md with our Skill definition. We start with YAML front matter that helps Codex discover and invoke the Skill:
The YAML front matter is crucial: the name field becomes the Skill's identifier (what you'll type to invoke it), while description helps Codex understand when this Skill is relevant. Without this metadata, Codex may not reliably discover or prioritize the Skill. The Role section then establishes the Skill's identity and sets expectations for the expertise lens to apply during review.
Next, we instruct Codex on where to get the code to review. We want the Skill to work flexibly, but we need to be realistic about what's possible:
This three-tier approach ensures the Skill always has something to work with. The default case (current working tree) is seamless and requires no setup—developers working on changes can invoke the Skill and get instant feedback on their work in progress.
The branch case is marked as requiring tool execution because Codex will need to run git commands. This means you'll need to approve tool usage when prompted, and your repository must be properly configured with remotes. We're not promising automatic PR fetching from GitHub or other providers, since that typically requires additional integrations or API setup that may not be available in all contexts.
The fallback option (asking the user) prevents confusion when someone runs the Skill in a clean working directory. Rather than producing an empty or error response, Codex prompts for clarification, maintaining a helpful interaction flow.
We need to specify exactly how Codex should summarize the changes. Structured output makes reviews more useful and easier to scan:
By requesting 5-10 bullets, we force conciseness while ensuring adequate coverage. Fewer than 5 bullets might miss important aspects of a complex change; more than 10 suggests the summary is too granular and should be consolidated.
The instruction to focus on functional changes rather than line-by-line details keeps the summary at the right abstraction level. We want to understand what changed and why, not get a play-by-play of every modified line. The present tense convention ("Adds" not "Added") creates consistency and reads more naturally in the context of reviewing current work.
The most valuable part of an automated review is catching potential problems before they reach production. We need to enumerate the specific risks we want flagged:
This checklist gives Codex concrete patterns to look for rather than vague "check for issues." Each category includes examples that help the agent recognize similar problems: if we mention "injection risks," Codex knows to look for SQL injection, command injection, and related vulnerabilities.
The requirement to specify severity and mitigation for each risk ensures actionable output. A risk flag without severity guidance might cause unnecessary alarm; a flag without mitigation suggestions leaves developers uncertain how to respond.
Code reviews should verify that changes come with appropriate tests. We instruct Codex to evaluate test quality and completeness:
The three questions create a progression from basic ("Are there tests?") to sophisticated ("Do they cover edge cases?"). This ensures that even if tests exist, we evaluate whether they're comprehensive enough.
The request for specific test suggestions (2-3 examples) is crucial. Generic advice like "add more tests" isn't actionable; concrete suggestions like "add a test for null input handling" or "test the timeout behavior" give developers clear next steps.
Changes often require documentation updates that are easy to forget. We add a documentation and CHANGELOG check:
By asking three targeted questions, we ensure consideration of different documentation types. Docstrings live close to code and should always stay synchronized. README and guides provide user-facing information that must match actual behavior. Changelogs help users understand version-to-version changes.
The instruction to identify specific documentation makes the output actionable. Rather than "update docs," we want "refreshTokenAsync method needs docstring for the new timeout parameter" or "README section on authentication needs to mention the new OAuth flow."
Every review should conclude with a clear recommendation. We specify exactly how this decision should be structured:
The three-option format (approve, request changes, needs discussion) maps to common code review workflows while avoiding ambiguity. The emoji prefix makes the recommendation immediately visible when scanning output.
Requiring rationale (2-3 sentences) ensures the recommendation isn't arbitrary. The explanation might say, "Approve because security and correctness concerns are addressed; performance optimization can happen in a follow-up," or "Request changes due to the SQL injection vulnerability in the search handler." This explanation helps developers understand the decision and provides learning value beyond the immediate review.
Now that we've fully defined our Skill, we need to understand how to actually run it. Codex supports two complementary approaches for working with Skills, each suited to different scenarios:
Interactive Picker (Discovery Mode)
The interactive approach lets you browse and select Skills through a menu interface:
When you type /skills without any arguments, Codex opens a picker showing all available Skills (both repository-level and user-level). You can browse the list, read descriptions, and select the Skill you want to run. This method is ideal when you're:
- Exploring what Skills exist in a new project
- Learning what a Skill does before committing to run it
- Unsure of the exact Skill name
- Working with Skills for the first time
The picker interface helps with discovery and reduces the cognitive load of remembering exact Skill names. It's particularly valuable when joining a new team or repository where you need to learn the available workflows.
Direct Invocation (Automation Mode)
The direct approach runs a Skill immediately by specifying its name:
This syntax executes the code-review Skill directly without showing a menu. Direct invocation is better when you:
- Know exactly which Skill you need
- Want to script or automate Skill execution
- Need faster execution in repeated workflows
- Are integrating Skills into CI/CD pipelines
For our code-review Skill, you might use the interactive picker the first few times to get familiar with its output format, then switch to direct invocation once you're running reviews regularly as part of your development workflow.
Both methods produce identical results—they're just different entry points to the same functionality. Throughout this course, we'll practice both approaches so you build fluency with the full toolset and can choose the right method for each situation.
When our code-review Skill runs, it produces structured output following our specified format. Here's what a typical review looks like:
Notice how the output precisely follows our specified structure. Each section addresses the requirements we defined: the summary uses 5 bullets with present tense, risks include severity and mitigation, test suggestions are specific and actionable, documentation identifies exact files to update, and the recommendation explains its reasoning clearly. This consistency makes reviews predictable and easy to act upon.
The code-review Skill we built is comprehensive, but your team might need a different emphasis or additional checks. Skills are templates; customize them for your context.
For example, if your team works with infrastructure code, add a section for deployment impact: "Does this change require database migrations? Will it cause downtime? Are rollback procedures documented?" For frontend teams, add accessibility checks: "Are new UI components keyboard navigable? Do images have alt text? Is color contrast sufficient?"
The key is to encode what experienced team members check manually into the Skill so that everyone applies the same standards consistently. Think of each Skill as crystallizing tribal knowledge into a reusable, shareable format.
One powerful aspect of defining Skills in .codex/skills/ is that these files become documentation. New team members can browse the skills directory to understand the project's standards, workflows, and priorities.
Skills also evolve with the project. When the team decides that all new features require performance benchmarks, update the code-review Skill to check for benchmark presence. When a security incident reveals a new class of vulnerability, add it to the risk flags checklist. The Skill becomes a living document of lessons learned and standards established.
This is why Skills live in version control: they're versioned alongside code, reviewed in pull requests, and improved iteratively just like any other project artifact.
An important design principle: keep Skills concise and structured. Codex can use progressive disclosure, pulling in Skill content only when relevant rather than loading all instructions into context upfront. This means well-organized Skills with clear sections are both easier to maintain and more efficient at runtime.
In this lesson, we've learned how to create custom Skills that encode team standards and automate repetitive workflows. We built a complete code-review Skill that performs comprehensive code reviews with consistent structure: summarizing changes, flagging risks across multiple dimensions, analyzing test coverage, checking documentation impact, and providing clear recommendations with rationale.
We've seen how Skills live in .codex/skills/ for team-shared workflows, how to design Skill contracts with clear inputs and outputs, and how structured specifications ensure consistent, actionable results. We also explored the simpler option of using the built-in /review command with AGENTS.md standards for teams that don't need the full control of custom Skills.
The key insight is that Skills transform tribal knowledge into reusable automation: the review standards that senior developers apply instinctively become available to the entire team through an explicit invocation.
Custom Skills turn Codex from a general assistant into a specialized tool that understands your project's specific needs, standards, and workflows. By investing time in well-designed Skills, we make every interaction more efficient and every output more valuable.
Ready to create Skills that capture your team's expertise? The upcoming practice exercises will have you defining custom Skills for different workflows, testing them against real code changes, and iterating on their design based on the quality of output they produce!
