Welcome to Unit 5! You have learned to evaluate specifications in the previous unit. Now, you will learn the actual spec-driven development (SDD) workflow: collaborating with Codex to generate specifications.
The traditional mistake was believing SDD meant writing detailed specifications by hand before coding. That approach failed because it was too time-consuming. Modern SDD with AI changes everything. Humans provide requirements and intent through structured prompts, while AI generates formal specifications using templates. Together, you create better specifications faster than jumping straight to code.
This lesson teaches you the partnership model: how to guide AI to generate high-quality specifications, how to review them critically, and how to refine them efficiently. You are not a specification writer—you are a specification architect.
The key to effective SDD with AI is understanding each party's role and strengths. This partnership transforms specification creation from a bottleneck into an accelerator.
Your Role as Specification Architect:
You define the WHAT you want by providing business intent and requirements. You review AI-generated specs critically, catching domain knowledge that the AI lacks. You guide refinement through targeted feedback and approve the specification when it meets the quality threshold of 75/100 or higher. Your domain expertise validates what the AI generates.
AI's Role as Specification Generator:
Codex systematically covers all template sections and considers edge cases you might forget. It maintains consistent format across all specifications and generates complete technical plans. However, it needs your domain expertise to validate business logic and catch domain-specific constraints.
Why This Partnership Works:
Humans excel at understanding business needs, domain constraints, and user value. AI excels at systematic documentation, comprehensive coverage, and consistent formatting. By combining these strengths, you leverage AI's speed without losing the human judgment that ensures specifications meet real business needs.
The core principle is simple: humans define the WHAT and WHY, while AI generates the HOW. This division prevents the common mistake of having AI make business decisions or having humans waste time on formatting.
When you provide requirements, focus on business outcomes and constraints. Describe what success looks like from the user's perspective. Identify domain-specific rules and edge cases that matter to your business. The AI then takes these inputs and generates a comprehensive specification that covers all template sections, proposes validation rules, and structures the information consistently.
This approach is faster than writing code directly because you catch design issues early. A 15-minute specification review prevents hours of refactoring later. The specification becomes documentation that survives code changes, helping future developers understand intent and enabling confident modifications.
Understanding the complete workflow helps you work efficiently. Here is how features go from idea to implementation.
Step 1: Provide Structured Requirements
Start by giving Codex context about the system, existing features, and current user workflow. State your requirements with specific values rather than vague terms. Describe the user perspective: who needs this, what problem it solves, and what success looks like. Include domain constraints that affect the solution.
Step 2: AI Generates Comprehensive Specification
Codex follows the template to generate a specification covering all sections. It proposes validation rules, error scenarios, and success criteria. The output is formatted consistently with your existing specifications.
Step 3: Review Specification Critically
You validate business logic against domain knowledge and check for missing edge cases. Score the specification on five dimensions: clarity, completeness, testability, consistency, and appropriate abstraction. Identify specific issues that need addressing.
Step 4: AI Refines Based on Feedback
Provide targeted feedback on specific lines or sections. Codex addresses your concerns and adds missing details. This typically takes 1-2 iterations to reach approval threshold.
Step 5: Approve Final Specification
Confirm the score is 75/100 or higher. Verify all critical sections are complete and accurate. Check that examples use realistic data formats.
Step 6: AI Generates Tests and Implementation
With an approved specification, Codex creates tests that verify the specified behavior. It then generates implementation code that passes those tests. The specification guides both test and code generation.
Step 7: Validate Against Business Needs
Run the tests to verify behavior matches the specification. Check edge cases and error handling. Confirm the implementation solves the actual user problem.
The quality of your prompt directly impacts the quality of the generated specification. A well-structured prompt can reduce iterations from 3-4 down to 1-2 while improving initial scores from 65/100 to 78-85/100.
The Enhanced Prompt Structure:
Your prompt should start by naming the feature and providing system context. Reference CODEX.md for tech stack and standards, list related existing features, and describe the current user workflow.
Guide The AI To Learn From Your Codebase:
The most powerful prompt element is ANALYZE EXISTING CODE. This section tells the AI where to look and what patterns to extract.
For example, if you are adding a new endpoint, point the AI to a similar endpoint. Tell it to extract the response format, validation patterns, and error handling approach. The AI will then match those patterns in the new specification.
State Requirements With Specificity:
Vague requirements produce vague specifications. Always use concrete values and measurable criteria.
Instead of "reasonable limit," say "10 max." Instead of "valid format," provide the actual regex pattern. Instead of "acceptable response time," specify "under 200ms for 95th percentile."
Set Format and Quality Expectations:
Always reference the template and provide an example specification as a quality benchmark.
The CONSTRAINTS section is critical. It prevents the AI from making implementation decisions or using placeholder values in examples. These three lines significantly improve specification quality.
Score specifications out of 100 across five dimensions. This scoring system provides objective criteria for approval decisions and helps you give targeted feedback.
Dimension 1: Clarity (0-25 Points)
Clarity measures whether every term is precisely defined with no ambiguity. High-scoring specifications define formats with regex patterns, provide examples for abstract concepts, and eliminate words like "appropriate," "reasonable," or "valid" without explanation.
Score 20-25 when all terms are defined precisely with specific formats or patterns and there is zero ambiguous language. Score 15-19 when most terms are defined but one or two need more precision. Score 10-14 when multiple vague terms exist but core concepts are understandable. Score 0-9 when ambiguity prevents clear understanding of requirements.
Dimension 2: Completeness (0-25 Points)
Completeness measures whether all template sections are present and substantial. Check for [TBD] markers, "will be determined" phrases, or stub sections with only a sentence where a paragraph is needed.
Score 20-25 when all template sections are present and substantial with no placeholders. Score 15-19 when all sections exist but one or two are thin. Score 10-14 when one section is missing or multiple sections are stubs. Score 0-9 when multiple sections are missing or mostly empty.
Dimension 3: Testability (0-20 Points)
Testability measures whether scenarios are structured enough to write automated tests. Look for given/when/then format and concrete examples using real data formats like UUID4 and ISO-8601.
Score 16-20 when all scenarios follow given/when/then format and use realistic data. Score 12-15 when scenarios are structured but some examples use placeholders like "some-id" or "timestamp." Score 8-11 when scenarios exist but lack concrete structure or realistic data. Score 0-7 when there are no structured scenarios or only narrative descriptions.
Dimension 4: Consistency (0-20 Points)
Consistency measures whether the specification follows your established conventions from CODEX.md. Check naming conventions, error formats, response structures, and API patterns.
Score 16-20 when the specification matches all CODEX.md conventions. Score 12-15 when it follows most conventions with only one or two minor deviations. Score 8-11 when there are multiple inconsistencies with existing API patterns. Score 0-7 when the specification introduces new patterns that contradict established conventions.
Dimension 5: Appropriate Abstraction (0-10 Points)
Appropriate abstraction measures whether the specification focuses on WHAT and WHY rather than HOW. Look for mentions of databases, code structure, or implementation details.
Time-box your reviews to maintain momentum while ensuring quality. Different review depths serve different purposes.
10-Minute Triage Review:
Quickly scan for obvious issues. Check that all template sections are present. Look for obviously vague terms like "appropriate" or "reasonable." Verify the specification matches CODEX.md conventions. Confirm examples use realistic data formats rather than placeholders.
30-Minute Deep Review:
Score each dimension systematically and identify specific issues. For each section, note line numbers and exact problems. Document what needs to change rather than just noting that something is wrong.
5-Minute Revision Check:
After the AI refines the specification, verify your feedback was addressed. Re-score the affected dimensions. Check if the total score reached 75 or higher.
Providing Effective Feedback:
Specificity is key. Point to exact line numbers and state precisely what needs to change. Write "Line 42: Replace 'valid format' with regex: ^[a-z0-9-]{1,30}$" rather than "Make it clearer." Show examples of what you want rather than describing it abstractly.
Your workflow now combines human judgment with AI speed. Prepare your prompt in 5 minutes by identifying related code or specifications and gathering requirements. Generate the specification in 2 minutes by providing the enhanced prompt to AI. Review and score in 10-30 minutes by evaluating on the five dimensions. Refine through 5-10 minutes per iteration by providing specific feedback and re-scoring. Approve when the score reaches 75 or higher and proceed to implementation.
This process typically takes 10-15 minutes to reach an approved specification, compared to 20-25 minutes with informal prompts. More importantly, the resulting specification prevents design issues that would take hours to fix in code. You are not just creating documentation—you are architecting better features through thoughtful collaboration with AI.
