You've learned that tasks should be "atomic" - small, focused units. But there's a tension:
Too Large:
- AI loses focus across many files
- Context overflow leads to errors
- Difficult to review massive changes
- Hard to isolate bugs when things break
Too Small:
- Excessive context switching overhead
- Each task requires re-reading specs and understanding codebase
- Integration overhead connecting tiny pieces
- Workflow becomes unwieldy with 50+ micro-tasks
The Question: Where's the sweet spot? How do you know if a task is "just right"?
An atomic task has these properties working together:
1. Clear, Testable Completion Criteria
✅ GOOD (Observable, Measurable):
- CommentRepository has create() method returning Comment
- Unit tests pass: pytest tests/unit/test_comment_repository.py
- Coverage ≥90%
- Type checking passes: mypy src/repositories/comment_repository.py
❌ BAD (Vague, Subjective):
- Repository works well
- Code is clean
2. Reasonable Implementation Scope (45-120 minutes)
✅ REASONABLE (60 min):
- Implement CommentRepository with 5 CRUD methods
- Write 8 unit tests with mocked DB
❌ TOO LARGE (>2 hours):
- Implement entire commenting system (8 files, 6 hours)
❌ TOO SMALL (<30 min):
- Add one method to existing repository (15 minutes)
Why 45-120 minutes?
- Below 45 min: Setup overhead exceeds implementation time
- Above 120 min: Fatigue sets in, AI context degrades, too much to review
3. Logical Cohesion (Complete Capability)
✅ COHESIVE: [T001] Implement Comment Creation
- Model + Repository + Service + API endpoint + Tests
- Result: Users CAN create comments (working feature)
❌ INCOHERENT: [T001] Create All Model Definitions
- Comment, Attachment, Notification, Tag models
- Result: 4 models exist but NO working features
4. Independent or Explicit Dependencies
✅ EXPLICIT DEPENDENCIES: [T005] Create Comment API Endpoints
- Depends on: T003 (CommentService), T004 (CommentSchema)
- Integration: Import from src/services/comment_service.py
❌ UNCLEAR: [T005] Create API Endpoints
- Depends on: "backend stuff"
1. Scope Management: Break 8-hour features into 2-hour chunks for fresh AI context and reviewable PRs
2. Parallel Work: Enable multiple developers to work simultaneously on independent tasks
3. Risk Isolation: Test complex areas (like external API integrations) separately before integration
4. Clear Milestones: Demonstrate incremental progress with working features
Example:
❌ ONE BIG TASK: [T001] Build Complete Commenting System (8 hours)
- No demo until day 8, entire feature blocked if issues occur
✅ PHASED TASKS:
[T001] Comment Model + Repository (2 hours)
[T002] Comment Service + Validation (2 hours)
[T003] Comment API + Integration Tests (2 hours)
[T004] Authorization + E2E Tests (2 hours)
- 4 reviewable PRs, 4 milestones, can ship T001-T002 early
Every new task requires:
- AI re-reading specification
- Analyzing codebase for patterns
- Reconstructing mental model
- Integration overhead
Time Cost: ~10-15 minutes setup per task
Example:
❌ OVER-SPLIT (4 tasks):
- 75 min implementation + 40 min setup = 115 min (53% overhead)
✅ VERTICAL SLICE (1 task):
- 75 min implementation + 10 min setup = 85 min (13% overhead)
Savings: 30 minutes (26% faster)
Key Insight: Only split tasks when benefits (scope management, parallelism, risk isolation) exceed context switching costs.
1. Scope Genuinely Large (>2 hours)
- Split 6-hour feature into three 2-hour tasks
- Each task delivers testable milestone
2. Natural Feature Boundaries
[T001] Comment Creation (90 min) - Users can add comments [T002] Comment Deletion (75 min) - Users can delete comments [T003] Comment Editing (60 min) - Users can edit comments
Each is independently valuable and testable
3. Technical Complexity Warrants Isolation
[T001] Payment Model + Validation (90 min) - Low risk [T002] Stripe Integration (2 hours) - Complex, worth isolating [T003] Payment API (90 min) - Integrate tested components
4. Parallel Opportunities
AFTER: Comment Model exists [T002] CommentRepository (90 min) - Developer A [T003] CommentSchema (45 min) - Developer B
Both can run in parallel (45 min savings)
1. Implementing Technical Layers Separately
❌ BAD (By Layer): [T001] Add Comment model (20 min) [T002] Add CommentRepository (30 min) [T003] Add CommentService (30 min) [T004] Add Comment API (40 min) → No working feature until T004, 4× context switching
✅ GOOD (Vertical Slice): [T001] Implement Comment Creation (90 min) → Working feature, end-to-end testable immediately
2. Adding Single Field
❌ BAD: Split into 5 tasks (model, schema, repository, API, tests) ✅ GOOD: One task "Add Task Priority Field Throughout" (60 min)
3. Simple CRUD
❌ BAD: One task per endpoint (GET, POST, PATCH, DELETE) ✅ GOOD: One task "Implement Task CRUD API" (90 min)
Example 1: Vertical Slice
[T001] Implement Comment Creation (60-90 min)
FILES: model, repository, service, schema, API, tests
ACCEPTANCE:
- POST /api/tasks/{task_id}/comments creates comment
- Content validation (1-5000 chars)
- Authorization (user must own task)
- Tests pass with 90%+ coverage
RESULT: Users CAN create comments (working feature)
Example 2: Related Functionality
[T002] Implement Comment Listing (60 min)
FILES: repository methods, API endpoints, tests
ACCEPTANCE:
- GET /api/tasks/{task_id}/comments returns list
- GET /api/comments/{id} returns single comment
- Pagination with skip/limit
- Authorization verified
RESULT: Users CAN view comments (working feature)
Example 3: Feature + Security
[T003] Implement Comment Deletion (75 min)
FILES: service, API endpoint, tests
ACCEPTANCE:
- DELETE /api/comments/{id} removes comment
- Author can delete (authorization)
- Task owner can delete any comment
- Non-owners get 403 Forbidden
- Comprehensive auth tests
RESULT: Secure deletion (working feature)
Example 1: Over-Split Model
❌ BAD TASKS (8 tasks, 50 min + 8× setup): [T001] Create Comment class (10 min) [T002] Add id field (5 min) [T003] Add task_id field (5 min) [T004] Add user_id field (5 min) [T005] Add content field (5 min) [T006] Add created_at field (5 min) [T007] Add task relationship (5 min) [T008] Add author relationship (5 min)
Problems: Ridiculous granularity, 8× context switching
✅ BETTER (1 task): [T001] Create Comment Model (45 min)
- Complete model with all fields and relationships
Example 2: Over-Split Endpoint
❌ BAD TASKS (7 tasks, 100 min + 7× setup): [T001] Create router file (5 min) [T002] Add route signature (10 min) [T003] Add request validation (15 min) [T004] Add business logic (20 min) [T005] Add response formatting (10 min) [T006] Add error handling (15 min) [T007] Add tests (25 min)
Problems: Splitting ONE endpoint, can't test until T007
✅ BETTER (1 task): [T001] Implement POST /api/comments Endpoint (90 min)
- Complete endpoint with validation, logic, errors, tests
Summary: Aim for 45-120 minute tasks that deliver complete, working features. Avoid over-splitting (excessive context switching) and under-splitting (loss of focus). When in doubt, favor vertical slices over horizontal layers.
We've covered the principles, patterns, and pitfalls of atomic task design. Let's consolidate what you've learned.
The Four Pillars of Atomic Tasks:
-
Clear, Testable Criteria - No vague goals like "works well" or "code is clean." Define observable, measurable outcomes with specific tests, coverage targets, and type-checking requirements.
-
45-120 Minute Scope - The sweet spot that balances setup overhead with focus. Below 45 minutes wastes time on context switching; above 120 minutes risks fatigue and context degradation.
-
Complete Logical Cohesion - Deliver working features, not isolated technical artifacts. A task should result in something users can interact with or developers can test end-to-end.
-
Explicit Dependencies - Declare what each task needs and what it produces. No hidden blockers, no unclear integration points.
Golden Rules to Apply:
-
Favor Vertical Slices Over Horizontal Layers - Build complete features (model → repository → service → API → tests) rather than all models, then all repositories, then all services.
-
Split When Benefits Exceed Costs - Only divide tasks when scope management, parallelization, or risk isolation genuinely outweigh the 10-15 minute context switching penalty.
-
Deliver Working Features, Not Technical Artifacts - Each task should produce something demonstrable and testable, not just "all the models" or "database layer."
-
Aim for Testable Milestones - Every task completion should be verifiable with passing tests, working endpoints, or observable behavior.
Final Insight:
Atomic doesn't mean tiny—it means indivisible without losing value. A 90-minute vertical slice that ships a complete, working feature is far more atomic than eight 5-minute micro-tasks that deliver nothing useful until all eight complete. Master this balance, and you'll design tasks that keep AI focused, reviews manageable, and progress steady.
