8 min read, 1410 words, last updated: 2025/12/29

Spec-Driven Development with Multi-Agent Systems: Documentation as Foundation

When implementing spec-driven, test-driven multi-agent workflows for complex existing applications, one critical question emerges: Should you first document all existing functionality and current state before proceeding? The answer is a resounding yes—but not in the traditional sense of "writing documentation."

Introduction

Modern software development increasingly relies on multi-agent systems to handle complex workflows, from code generation to testing and deployment. However, when applying these approaches to legacy systems—particularly consumer-facing (ToC) applications with accumulated complexity—teams face a fundamental challenge: How do you enable agents to work effectively with systems they've never seen before?

This article explores why specification-driven development requires a structured approach to documenting current system behavior, and how to implement this documentation strategy to support both human developers and AI agents.

The Context: Why Legacy Systems Break Multi-Agent Workflows

The Multi-Agent Prerequisite

Multi-agent systems operate on a fundamental principle: they can only work with cognitively separable objects. Unlike human developers who can "read code and infer business logic," agents require explicit:

Specifications
Contracts
Invariants
Test oracles

The Legacy System Reality

Most existing ToC applications suffer from common issues:

Accumulated functionality: Features built incrementally without comprehensive design
Documentation debt: Real behavior ≠ README ≠ Product Manager's memory
Hidden rules: Business logic embedded in:
- Conditional branches
- Magic numbers
- Historical bug workarounds
- Implicit assumptions

The Inevitable Failure Pattern

Without proper documentation groundwork, spec-driven multi-agent workflows typically fail in predictable ways:

Incomplete specifications: Missing edge cases and implicit behaviors
Unstable tests: Tests that don't capture real system behavior
Agent conflicts: One agent fixes functionality while breaking another
Human intervention loops: Developers forced back into firefighting mode

Core Concepts: The Four Pillars of Current-State Documentation

The solution isn't traditional comprehensive documentation, but rather a structured "archeological excavation" focused on enabling multi-agent workflows.

1. Feature Inventory (Not Feature Documentation)

Instead of describing "what we built," document what exists:

## Feature Modules
- User registration / authentication
- Content browsing
- Order processing
- Payment handling
- Refund management
- Push notifications
- Risk control / limitation logic

For each module, answer only three questions:

### Order Processing
- Under what conditions can users place orders
- What internal state changes occur after successful order placement
- What are the primary known failure modes

Key principles:

❌ No flowcharts or implementation details
❌ No architectural explanations
✅ Only factual behavioral descriptions

2. Critical Business Invariants

These form the lifeline of spec-driven + TDD + multi-agent systems:

## System Invariants
- Paid orders cannot revert to "unpaid" status
- One user can only have one active order at any time
- Refunded orders cannot initiate additional refunds
- Banned users cannot trigger any write operations

These statements are:

Code-independent: Don't rely on implementation details
UI-agnostic: Don't depend on interface specifics
Highly stable: Rarely change over time
Test-friendly: Easily converted to property tests and regression tests

3. Accepted Anomaly Catalog

This captures the messy but accepted reality of production systems:

## Known but Accepted Exceptions
- Payment success with callback failure may show frontend failure (requires manual reconciliation)
- Network instability may cause duplicate requests (backend uses idempotent keys)
- Legacy user data missing fields triggers compatibility logic

Why this matters:

Agents naturally try to "fix" inelegant behavior
Some system "messiness" is deliberately accepted
Without documentation, agents will "helpfully" break production logic

4. Authority Hierarchy

Define conflict resolution rules when specifications, tests, and human knowledge disagree:

## Truth Source Priority
1. Production behavior (logs + data)
2. Existing core tests (if any)
3. Explicitly written specifications
4. Human memory / verbal agreements

This hierarchy becomes critical for multi-agent conflict resolution.

Analysis: The Three-Layer Specification Architecture

The most crucial architectural decision involves how to handle specification evolution. The answer is a layered approach that separates historical facts from future intentions.

Layer 1: Current State Spec (Frozen, Read-Only)

Purpose: Historical system behavior snapshot

spec/
 ├─ current/
 │   ├─ 2025-01/
 │   │   ├─ order.md
 │   │   ├─ payment.md
 │   │   └─ invariants.md

Rules:

✅ Only update when facts actually change
❌ No "polish" modifications
❌ No completeness requirements
✅ Explicit "uncertain/uncovered" sections allowed

Layer 2: Target Spec (Living, Modifiable)

Purpose: Desired future system behavior

spec/
 ├─ target/
 │   ├─ order-v2.md
 │   ├─ payment-refactor.md

All new requirements, refactoring goals, and behavior modifications go here. This serves as:

Primary agent execution input
Test design source
Code review reference

Layer 3: Difference Documentation (Bridge)

Purpose: Explicit change management

## Diff: current → target
 
- Current: Users can have multiple pending orders
- Target: Only one active order allowed
- Risks:
  - Legacy data compatibility
  - Concurrent request handling

This layer provides:

Human decision points
Agent risk awareness
Natural TDD test case generation

Why This Architecture Works

The three-layer approach solves critical problems:

Prevents reference loss: Historical behavior remains accessible for regression testing
Enables change tracking: Clear distinction between "what is" vs "what should be"
Supports agent reasoning: Agents understand they're "building future" not "explaining past"

Implementation Strategy: Practical Rollout

Phase 1: Minimum Viable Documentation

Rather than attempting comprehensive documentation, start with immediate needs:

Select one module requiring near-term changes
Document only that module using the four pillars:
- Feature boundaries
- Invariants
- Accepted anomalies
- Authority hierarchy
Create: spec/current-state.md
Then begin: test writing and agent integration
Repeat for subsequent modules

Phase 2: Specification Evolution Process

New requirements workflow:

New requirement → Write target spec
Agent/human development → Code changes + tests
Production deployment → Behavior confirmation
Generate new current snapshot → Add time-stamped version (don't overwrite)

Phase 3: Error Handling

When current specs are discovered to be wrong:

Scenario A: Initial misunderstanding

## Addendum (Discovered 2025-02)
Previous description omitted:
- Under condition XXX, system actually behaves YYY

Scenario B: System actually changed Create new snapshot: spec/current/2025-03/

Implications and Best Practices

For Multi-Agent Workflows

Agent Specialization: Different agents can focus on different specification layers
Conflict Resolution: Clear authority hierarchy prevents agent conflicts
Regression Prevention: Historical specs enable "behavior preservation" agents

For Development Teams

Reduced Context Switching: Specifications serve as shared understanding
Safer Refactoring: Clear change boundaries and risk documentation
Onboarding Acceleration: New team members understand both current and target states

For Legacy System Migration

Incremental Modernization: Module-by-module specification and refactoring
Risk Management: Explicit anomaly documentation prevents "fixing" accepted behavior
Audit Trail: Complete change history for compliance and debugging

Code Example: Specification Template

# Order Processing Module - Current State (2025-01)
 
## Feature Boundaries
- Order creation: Authenticated users with valid payment methods
- Order modification: Within 30-minute window, status-dependent
- Order cancellation: User-initiated or system timeout
 
## Business Invariants
- Order.total_amount == sum(OrderItem.price * OrderItem.quantity)
- Order.status transitions: draft → pending → paid → fulfilled
- Cancelled orders cannot transition to any other status
 
## Known Anomalies
- Mobile app may show "processing" for up to 5 minutes after payment
- Concurrent order attempts may create duplicate drafts (cleaned by daily job)
 
## Authority Hierarchy
1. Production order_events table
2. Existing payment integration tests
3. This specification document
4. Product team requirements

Conclusion

Successfully implementing spec-driven development with multi-agent systems requires treating documentation not as a one-time effort, but as foundational infrastructure. The key insights are:

Documentation must serve agents first, humans second: Agents require explicit, structured information that humans can often infer
Historical accuracy trumps completeness: Better to have accurate partial documentation than comprehensive but incorrect specifications
Layered specifications enable evolution: Separating "what is" from "what should be" prevents the loss of crucial historical context

The three-layer specification architecture—current state (frozen), target state (living), and difference documentation (bridge)—provides a robust foundation for multi-agent workflows while maintaining the historical context necessary for safe system evolution.

For teams facing similar challenges with legacy systems, the recommendation is clear: invest in structured current-state documentation first. It's not just documentation—it's the foundation that makes everything else possible.

The bottom line: Spec-driven development doesn't begin with writing specifications—it begins with understanding what you actually have.