AI-Assisted Specification Generation and Review

Introduction: The Human-AI Partnership in SDD

Welcome to Unit 5! You've learned to evaluate specifications in Units 1-4. Now you'll learn the actual Spec-Driven Development workflow: collaborating with AI to generate specifications.

The Traditional Mistake:

Many developers think SDD means "write detailed specifications by hand before coding." That's the old way - and it's why specification-driven approaches failed historically. They were too time-consuming.

The SDD Reality with AI:

In modern SDD with Claude Code:

Humans provide requirements and intent (structured prompts)
AI generates formal specifications using templates
Humans review for quality and business correctness
AI refines based on feedback
Humans approve when spec meets quality threshold

Your Role in the Partnership:

You are NOT a specification writer. You are a specification architect:

Define WHAT you want (business intent)
Review AI-generated specs critically
Catch domain knowledge AI doesn't have
Guide refinement through targeted feedback
Approve when quality threshold met (≥75/100)

AI's Role:

Claude Code is your specification generator:

Systematically covers all template sections
Considers edge cases you might forget
Maintains consistent format
Generates complete technical plans
But needs your domain expertise to validate

Core Principle: Division of Labor

Humans define WHAT and WHY. AI generates HOW.

This division leverages each party's strengths:

Humans excel at understanding business needs, domain constraints, and user value
AI excels at systematic documentation, comprehensive coverage, and consistent formatting

With AI assistance, specification creation becomes faster than jumping straight to code while producing better outcomes.

The AI-Accelerated SDD Workflow

Here's the typical workflow when building features with AI assistance:

Human provides structured requirements - Context, requirements, user perspective, constraints
AI generates comprehensive specification - Follows template, covers all sections
Human reviews critically - Validates business logic, checks domain accuracy, scores on 5 dimensions
AI refines based on feedback - Addresses specific issues, adds missing details
Human approves final spec - Confirms ≥75/100 score, verifies readiness
AI generates tests and implementation - Creates tests and code matching specification
Human validates against business needs - Runs tests, checks behavior

This workflow transforms specification from a bottleneck into an accelerator.

The Specification Generation Prompt Pattern

When you want to add a feature to TaskMaster, you start by giving Claude Code requirements in a structured prompt.

Essential Prompt Structure:

Three critical sections for quality:

ANALYZE EXISTING CODE - AI extracts patterns from your codebase
EXAMPLE REFERENCE - AI matches your existing spec quality/style
CONSTRAINTS - Guards against implementation details and vague language

Impact on iterations:

Approach	Iterations	Initial Score
Informal ("I need auth...")	3-4	65/100
Enhanced (all sections)	1-2	78-85/100

The 5 Dimensions of Specification Quality

Score specifications out of 100 across five dimensions:

1. Clarity (0-25) - Terms defined, no ambiguity (❌ "Valid format" → ✅ "Regex: ^[a-z0-9-]{1,30}$")
2. Completeness (0-25) - All template sections present, no [TBD] markers
3. Testability (0-20) - Given/When/Then format, concrete examples with real data
4. Consistency (0-20) - Follows CLAUDE.md, matches existing API patterns
5. Appropriate Abstraction (0-10) - WHAT not HOW, no database/code details

Calibrating Your Scores:

Use these ranges to score consistently:

Clarity (0-25):

20-25: All terms defined precisely with formats/patterns. Zero ambiguous language.
15-19: Most terms defined, 1-2 need precision (e.g., "reasonable limit" vs "10 max")
10-14: Multiple vague terms, but core concepts understandable
0-9: Ambiguity prevents understanding requirements

Completeness (0-25):

20-25: All template sections present with substance. No [TBD] or "will be determined"
15-19: All sections present, but 1-2 are thin (single sentence where paragraph needed)
10-14: Missing 1 section OR multiple sections are stubs
0-9: Multiple sections missing or mostly empty

Testability (0-20):

16-20: All scenarios in Given/When/Then. Examples use real formats (UUID4, ISO-8601)
12-15: Structured scenarios but some examples use placeholders ("some-id", "timestamp")
8-11: Scenarios present but lack concrete structure or realistic data
0-7: No structured scenarios, or only narrative descriptions

Consistency (0-20):

16-20: Matches all CLAUDE.md conventions (naming, error formats, response structure)
12-15: Follows most conventions, 1-2 minor deviations
8-11: Multiple inconsistencies with existing API patterns

Consolidating Multiple AI Outputs

Sometimes you'll have more than one AI-generated specification for the same feature—perhaps you asked Claude Code to generate multiple approaches for comparison, or different team members prompted independently. When specs diverge, you need a systematic way to resolve conflicts rather than arbitrarily picking one.

When This Happens:

Generating multiple candidate specs to compare approaches
Team members independently prompting AI for the same feature
Iterating across sessions and wanting to preserve good elements from earlier drafts

The Conflict-Resolution Rubric

When specs disagree on the same requirement, resolve conflicts using this priority order:

1. Constitution First

If your project has a constitution or CLAUDE.md with an explicit rule, it always wins—regardless of how well-written the conflicting spec is.

Example: CLAUDE.md mandates ISO-8601 timestamps. A spec using Unix epoch loses immediately, no exceptions.

2. Existing Code Conventions

When the constitution is silent, defer to how the current codebase already handles similar cases. Consistency reduces integration risk.

Example: Existing endpoints return errors as {error: {code, message}}. A spec proposing {success: false, error_text} loses even if it reads more cleanly.

3. Better UX / Testability

When neither the constitution nor existing code dictates an answer, prefer the version with stronger user experience or more testable acceptance criteria (concrete Given/When/Then, real data formats over placeholders).

Example: One spec limits pagination to "a reasonable number," another specifies "max 50 items, default 20." The second wins on testability.

4. Merge Complementary Content

Not every difference is a true conflict. Specs often cover different edge cases or add complementary detail—merge these sections instead of discarding one spec wholesale.

Example: Spec A documents rate-limiting behavior; Spec B documents concurrent-edit conflicts. Combine both into the unified spec.

Efficient Review Workflow

10-Minute Triage: All sections present? Obvious vague terms? Matches conventions? Examples realistic?

30-Minute Deep Review: Score each dimension, identify specific issues

5-Minute Revision Check: Feedback addressed? Score ≥75?

Give specific feedback:

✅ "Line 42: Replace 'valid format' with regex: ^[a-z0-9-]{1,30}$"
❌ "Make it clearer"

Summary: Your Workflow

Prepare prompt (5 min): Identify related code/specs, gather requirements
Generate (2 min): Provide enhanced prompt to Claude
Review and score (10-30 min): Evaluate on 5 dimensions
Refine (5-10 min/iteration): Specific feedback, re-score
Approve (≥75 score): Proceed to implementation

Result: 10-15 minutes to approved spec vs. 20-25 minutes with informal prompts.

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal