Creating Custom Skills

Introduction

Welcome back to Mastering Advanced AI Tooling in Codex! We're now at lesson 3, making steady progress through the course. In our previous lessons, we explored how to configure Codex through config.toml and how to safely enable web search capabilities with appropriate security controls. Today, we're going to learn something powerful and practical: how to create custom Skills that automate repetitive workflows. Skills let us encode our team's review standards, testing practices, and quality checks directly into Codex, turning it from a general assistant into a specialized tool that understands our project's specific needs. By the end of this lesson, we'll build a complete code-review Skill that performs thorough code reviews with consistent structure and depth.

Understanding Built-in Commands and Skills

Codex comes with several built-in slash commands like /review (review your working tree), /model (switch models), and /approvals (manage tool approvals). These are part of Codex itself—you use them but don't define new ones. When you need custom, reusable workflows that encode team-specific expertise, you create Skills . A Skill is a specialized capability stored in a SKILL.md file that Codex can invoke when relevant or when explicitly called. Think of Skills as recipes for AI interactions: each encapsulates a clear role definition (what expertise Codex should apply), a specific task (what we want accomplished), constraints on behavior (what to focus on or avoid), and an output format (how to present results). For our code review workflow, we have two options: Option A (Simpler): Use the built-in /review command and rely on AGENTS.md to define what "good review" means for this project. This works well when your review standards provide context that should apply to all interactions. Option B (More Control): Create a code-review Skill that implements your exact review rubric as an explicit, invokable workflow. This is better when you want a specialized review process distinct from general Codex behavior. We'll build Option B since it demonstrates how to create team-shared workflows that capture detailed expertise.

Where Skills Live

Skills can be defined in two locations depending on scope: Repository-level Skills go in .codex/skills/<skill-name>/SKILL.md at the root of the repository. These are team-shared capabilities that capture project-specific workflows: how this codebase should be reviewed, tested, or documented. Every team member working on the repository gets these Skills automatically. User-level Skills go in ~/.codex/skills/<skill-name>/SKILL.md . These are personal workflow tools that may not apply to everyone: your specific IDE preferences, your individual productivity patterns, or experimental workflows you're testing before proposing to the team. For our code-review Skill, we'll define it as a repository-level Skill since code review standards should be consistent across the team. This ensures that whether Alice or Bob invokes the Skill, they both apply the same rigor and produce comparable results.

Defining the Skill Purpose

Before writing any prompt text, we need to clearly specify what our code-review Skill should accomplish. A well-designed Skill has three components: clear inputs, defined outputs, and explicit constraints. For code-review, the input should be flexible: either the user provides branch context explicitly, or Codex automatically uses the current git diff if available. The output needs to be structured and actionable: a concise summary, risk flags with severity, test coverage analysis, documentation impact assessment, and a final recommendation. The constraints ensure we stay focused: only review changed code, avoid subjective style comments unless they affect correctness, and prioritize high-impact issues over minor improvements. This upfront design prevents scope creep. Without clear boundaries, a review Skill might drift into refactoring entire files, rewriting tests, or bikeshedding variable names instead of focusing on the actual changes and their risks.

Creating the Skill File

We create the Skill directory structure and file: Shell mkdir -p .codex/skills/code-review mkdir -p .codex/skills/code-review Then create .codex/skills/code-review/SKILL.md with our Skill definition. We start with YAML front matter that helps Codex discover and invoke the Skill: Markdown --- name: code-review description: Review current changes for correctness, security, performance, and testing. --- # Code Review Skill ## Role You are a senior software engineer conducting a detailed code review. Apply expertise in security vulnerabilities, performance optimization, test design, and API design patterns. --- name: code-review description: Review current changes for correctness, security, performance, and testing. --- # Code Review Skill ## Role You are a senior software engineer conducting a detailed code review. Apply expertise in security vulnerabilities, performance optimization, test design, and API design patterns. The YAML front matter is crucial: the name field becomes the Skill's identifier (what you'll type to invoke it), while description helps Codex understand when this Skill is relevant. Without this metadata, Codex may not reliably discover or prioritize the Skill. The Role section then establishes the Skill's identity and sets expectations for the expertise lens to apply during review.

Specifying Context Sources

Next, we instruct Codex on where to get the code to review. We want the Skill to work flexibly, but we need to be realistic about what's possible: Markdown ## Context - Default: review the current working tree diff (staged + unstaged changes) - If the user provides a branch name, compute diff against the base branch (typically: `git fetch` and then `git diff <base>...<branch>`) (requires git tool execution) - If no diff exists, ask the user what files, commits, or branch to review ## Context - Default: review the current working tree diff (staged + unstaged changes) - If the user provides a branch name, compute diff against the base branch (typically: `git fetch` and then `git diff <base>...<branch>`) (requires git tool execution) - If no diff exists, ask the user what files, commits, or branch to review This three-tier approach ensures the Skill always has something to work with. The default case (current working tree) is seamless and requires no setup—developers working on changes can invoke the Skill and get instant feedback on their work in progress. The branch case is marked as requiring tool execution because Codex will need to run git commands. This means you'll need to approve tool usage when prompted, and your repository must be properly configured with remotes. We're not promising automatic PR fetching from GitHub or other providers, since that typically requires additional integrations or API setup that may not be available in all contexts. The fallback option (asking the user) prevents confusion when someone runs the Skill in a clean working directory. Rather than producing an empty or error response, Codex prompts for clarification, maintaining a helpful interaction flow.

Defining the Summary Format

We need to specify exactly how Codex should summarize the changes. Structured output makes reviews more useful and easier to scan: Markdown ## Output Format ### 1. Summary (5-10 bullets) - Describe what changed at a high level - Focus on functional changes, not line-by-line details - Use present tense: "Adds authentication", "Refactors data layer" ## Output Format ### 1. Summary (5-10 bullets) - Describe what changed at a high level - Focus on functional changes, not line-by-line details - Use present tense: "Adds authentication", "Refactors data layer" By requesting 5-10 bullets, we force conciseness while ensuring adequate coverage. Fewer than 5 bullets might miss important aspects of a complex change; more than 10 suggests the summary is too granular and should be consolidated. The instruction to focus on functional changes rather than line-by-line details keeps the summary at the right abstraction level. We want to understand what changed and why, not get a play-by-play of every modified line. The present tense convention ("Adds" not "Added") creates consistency and reads more naturally in the context of reviewing current work.

Adding Risk Detection

The most valuable part of an automated review is catching potential problems before they reach production. We need to enumerate the specific risks we want flagged: Markdown ### 2. Risk Flags - Breaking changes: API signature changes, removed exports, behavior changes - Security issues: injection risks, auth bypasses, exposed secrets, XSS vectors - Performance concerns: new O(n²) algorithms, missing indexes, memory leaks - Backwards compatibility: database migrations, config format changes, deprecated feature usage For each risk, specify: severity (high/medium/low), affected components, mitigation suggestion. ### 2. Risk Flags - Breaking changes: API signature changes, removed exports, behavior changes - Security issues: injection risks, auth bypasses, exposed secrets, XSS vectors - Performance concerns: new O(n²) algorithms, missing indexes, memory leaks - Backwards compatibility: database migrations, config format changes, deprecated feature usage For each risk, specify: severity (high/medium/low), affected components, mitigation suggestion. This checklist gives Codex concrete patterns to look for rather than vague "check for issues." Each category includes examples that help the agent recognize similar problems: if we mention "injection risks," Codex knows to look for SQL injection, command injection, and related vulnerabilities. The requirement to specify severity and mitigation for each risk ensures actionable output. A risk flag without severity guidance might cause unnecessary alarm; a flag without mitigation suggestions leaves developers uncertain how to respond.

Analyzing Test Coverage

Code reviews should verify that changes come with appropriate tests. We instruct Codex to evaluate test quality and completeness: Markdown ### 3. Test Coverage Analysis - Are there tests for the new/changed code? - Do tests cover edge cases, error conditions, and boundary values? - Are there specific scenarios that need tests but are missing? Suggest 2-3 specific tests to add, with brief descriptions of what each should verify. ### 3. Test Coverage Analysis - Are there tests for the new/changed code? - Do tests cover edge cases, error conditions, and boundary values? - Are there specific scenarios that need tests but are missing? Suggest 2-3 specific tests to add, with brief descriptions of what each should verify. The three questions create a progression from basic ("Are there tests?") to sophisticated ("Do they cover edge cases?"). This ensures that even if tests exist, we evaluate whether they're comprehensive enough. The request for specific test suggestions (2-3 examples) is crucial. Generic advice like "add more tests" isn't actionable; concrete suggestions like "add a test for null input handling" or "test the timeout behavior" give developers clear next steps.

Checking Documentation Impact

Changes often require documentation updates that are easy to forget. We add a documentation and CHANGELOG check: Markdown ### 4. Documentation and Changelog - Do public API changes need docstring updates? - Should README or other documentation be updated? - Does this change warrant a CHANGELOG entry? Identify specific documentation that needs updates. ### 4. Documentation and Changelog - Do public API changes need docstring updates? - Should README or other documentation be updated? - Does this change warrant a CHANGELOG entry? Identify specific documentation that needs updates. By asking three targeted questions, we ensure consideration of different documentation types. Docstrings live close to code and should always stay synchronized. README and guides provide user-facing information that must match actual behavior. Changelogs help users understand version-to-version changes. The instruction to identify specific documentation makes the output actionable. Rather than "update docs," we want " refreshTokenAsync method needs docstring for the new timeout parameter" or " README section on authentication needs to mention the new OAuth flow."

Formulating the Final Recommendation

Every review should conclude with a clear recommendation. We specify exactly how this decision should be structured: Markdown ### 5. Final Recommendation Output one of: - "✅ Approve: Changes look good with minor suggestions noted above." - "⚠️ Request changes: Address high-severity risks before merging." - "🔍 Needs discussion: Architectural concerns require team review." Include 2-3 sentences explaining the rationale for your recommendation. ### 5. Final Recommendation Output one of: - "✅ Approve: Changes look good with minor suggestions noted above." - "⚠️ Request changes: Address high-severity risks before merging." - "🔍 Needs discussion: Architectural concerns require team review." Include 2-3 sentences explaining the rationale for your recommendation. The three-option format (approve, request changes, needs discussion) maps to common code review workflows while avoiding ambiguity. The emoji prefix makes the recommendation immediately visible when scanning output. Requiring rationale (2-3 sentences) ensures the recommendation isn't arbitrary. The explanation might say, "Approve because security and correctness concerns are addressed; performance optimization can happen in a follow-up," or "Request changes due to the SQL injection vulnerability in the search handler." This explanation helps developers understand the decision and provides learning value beyond the immediate review.

Two Ways to Invoke Skills

Now that we've fully defined our Skill, we need to understand how to actually run it. Codex supports two complementary approaches for working with Skills, each suited to different scenarios: Interactive Picker (Discovery Mode) The interactive approach lets you browse and select Skills through a menu interface: Shell codex > /skills codex > /skills When you type /skills without any arguments, Codex opens a picker showing all available Skills (both repository-level and user-level). You can browse the list, read descriptions, and select the Skill you want to run. This method is ideal when you're: Exploring what Skills exist in a new project Learning what a Skill does before committing to run it Unsure of the exact Skill name Working with Skills for the first time The picker interface helps with discovery and reduces the cognitive load of remembering exact Skill names. It's particularly valuable when joining a new team or repository where you need to learn the available workflows. Direct Invocation (Automation Mode) The direct approach runs a Skill immediately by specifying its name: Shell codex /skills code-review codex /skills code-review This syntax executes the code-review Skill directly without showing a menu. Direct invocation is better when you: Know exactly which Skill you need Want to script or automate Skill execution Need faster execution in repeated workflows Are integrating Skills into CI/CD pipelines For our code-review Skill, you might use the interactive picker the first few times to get familiar with its output format, then switch to direct invocation once you're running reviews regularly as part of your development workflow. Both methods produce identical results—they're just different entry points to the same functionality. Throughout this course, we'll practice both approaches so you build fluency with the full toolset and can choose the right method for each situation.

Understanding the Skill Output

When our code-review Skill runs, it produces structured output following our specified format. Here's what a typical review looks like: text## Code Review ### Summary - Adds JWT authentication to API endpoints - Refactors user service to support token refresh - Updates middleware chain to validate tokens - Adds rate limiting to prevent brute force attacks - Migrates session storage from memory to Redis ### Risk Flags **HIGH**: Breaking change in `/auth/login` response format - Old format: `{ token: string }` - New format: `{ accessToken: string, refreshToken: string }` - Mitigation: Add API version header support or keep old format temporarily **MEDIUM**: Performance concern in token validation - Token validation happens on every request - Consider caching decoded token payload for 30 seconds - Mitigation: Implement in-memory token cache with TTL ### Test Coverage ✅ Good coverage for happy paths ⚠️ Missing tests for: 1. Token expiration edge case (test that requests fail correctly with expired token) 2. Concurrent refresh scenarios (test race conditions when multiple requests refresh simultaneously) 3. Rate limit boundary (test behavior at exactly 100 requests/minute threshold) ### Documentation Impact - Update API documentation: `/auth/login` endpoint response format change - README: Add new environment variables (REDIS_URL, JWT_SECRET) - CHANGELOG: Breaking change notice for v2.0.0 ### Recommendation ⚠️ Request changes: Address the HIGH-severity breaking change before merging. Either maintain backwards compatibility with old response format or ensure all clients are updated. The security improvements are solid, but the breaking change needs migration strategy.## Code Review ### Summary - Adds JWT authentication to API endpoints - Refactors user service to support token refresh - Updates middleware chain to validate tokens - Adds rate limiting to prevent brute force attacks - Migrates session storage from memory to Redis ### Risk Flags **HIGH**: Breaking change in `/auth/login` response format - Old format: `{ token: string }` - New format: `{ accessToken: string, refreshToken: string }` - Mitigation: Add API version header support or keep old format temporarily **MEDIUM**: Performance concern in token validation - Token validation happens on every request - Consider caching decoded token payload for 30 seconds - Mitigation: Implement in-memory token cache with TTL ### Test Coverage ✅ Good coverage for happy paths ⚠️ Missing tests for: 1. Token expiration edge case (test that requests fail correctly with expired token) 2. Concurrent refresh scenarios (test race conditions when multiple requests refresh simultaneously) 3. Rate limit boundary (test behavior at exactly 100 requests/minute threshold) ### Documentation Impact - Update API documentation: `/auth/login` endpoint response format change - README: Add new environment variables (REDIS_URL, JWT_SECRET) - CHANGELOG: Breaking change notice for v2.0.0 ### Recommendation ⚠️ Request changes: Address the HIGH-severity breaking change before merging. Either maintain backwards compatibility with old response format or ensure all clients are updated. The security improvements are solid, but the breaking change needs migration strategy. Notice how the output precisely follows our specified structure. Each section addresses the requirements we defined: the summary uses 5 bullets with present tense, risks include severity and mitigation, test suggestions are specific and actionable, documentation identifies exact files to update, and the recommendation explains its reasoning clearly. This consistency makes reviews predictable and easy to act upon.

Adapting Skills for Your Workflow

The code-review Skill we built is comprehensive, but your team might need a different emphasis or additional checks. Skills are templates; customize them for your context. For example, if your team works with infrastructure code, add a section for deployment impact: "Does this change require database migrations? Will it cause downtime? Are rollback procedures documented?" For frontend teams, add accessibility checks: "Are new UI components keyboard navigable? Do images have alt text? Is color contrast sufficient?" The key is to encode what experienced team members check manually into the Skill so that everyone applies the same standards consistently. Think of each Skill as crystallizing tribal knowledge into a reusable, shareable format.

Skills as Living Documentation

One powerful aspect of defining Skills in .codex/skills/ is that these files become documentation. New team members can browse the skills directory to understand the project's standards, workflows, and priorities. Skills also evolve with the project. When the team decides that all new features require performance benchmarks, update the code-review Skill to check for benchmark presence. When a security incident reveals a new class of vulnerability, add it to the risk flags checklist. The Skill becomes a living document of lessons learned and standards established. This is why Skills live in version control: they're versioned alongside code, reviewed in pull requests, and improved iteratively just like any other project artifact. An important design principle: keep Skills concise and structured . Codex can use progressive disclosure, pulling in Skill content only when relevant rather than loading all instructions into context upfront. This means well-organized Skills with clear sections are both easier to maintain and more efficient at runtime.

Conclusion and Next Steps

In this lesson, we've learned how to create custom Skills that encode team standards and automate repetitive workflows. We built a complete code-review Skill that performs comprehensive code reviews with consistent structure: summarizing changes, flagging risks across multiple dimensions, analyzing test coverage, checking documentation impact, and providing clear recommendations with rationale. We've seen how Skills live in .codex/skills/ for team-shared workflows, how to design Skill contracts with clear inputs and outputs, and how structured specifications ensure consistent, actionable results. We also explored the simpler option of using the built-in /review command with AGENTS.md standards for teams that don't need the full control of custom Skills. The key insight is that Skills transform tribal knowledge into reusable automation: the review standards that senior developers apply instinctively become available to the entire team through an explicit invocation. Custom Skills turn Codex from a general assistant into a specialized tool that understands your project's specific needs, standards, and workflows. By investing time in well-designed Skills, we make every interaction more efficient and every output more valuable. Ready to create Skills that capture your team's expertise? The upcoming practice exercises will have you defining custom Skills for different workflows, testing them against real code changes, and iterating on their design based on the quality of output they produce!

Previous Lesson

Next Lesson: Prompt Time Substitution

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal