Beyond Storage: Designing a Persistent Intelligence Memory System for AI Agents

11 min read, 2002 words, last updated: 2026/2/15

Introduction

Most discussions about AI memory stop at retrieval. The common playbook: embed your conversation history, dump it into a vector store, and let semantic search surface relevant context when needed. It's a reasonable starting point—but if you're building a system that needs to evolve over months or years, it quickly shows its limits.

This post explores a different model: Persistent Intelligence (PI)—a memory architecture where the AI doesn't just retrieve memories, but actively reasons about them, maintains them, and decides what's worth keeping. We'll walk through the full design, from schema to agent roles to git versioning strategy, using a monorepo-based project as the concrete target.

Background: Why Retrieval-Only Memory Eventually Fails

The standard vector-store memory approach works well for short-horizon tasks. The problems emerge over time:

Passive accumulation. Every interaction gets logged. Nothing gets pruned. The store grows and degrades in signal-to-noise ratio. After six months, it's an archaeological site rather than a working memory.

No lifecycle management. There's no mechanism to ask: "Is this memory still valid? Has it been contradicted? Should it decay?" Without this, stale beliefs persist indefinitely alongside fresh ones.

No distinction between memory types. A one-off debugging note from last week is stored with the same weight as a core architectural principle you've held for a year. The agent has no way to know which is which.

No agent ownership. The system stores what happens to it, but the agent never reflects: "Does my current understanding update or invalidate a previous belief?"

These aren't edge cases—they're the core problem for any AI system expected to serve as a long-term collaborative partner across multiple projects.

The PI Model: Memory as Policy, Not Storage

The key reframe in PI design is this:

Memory is not storage. Memory is policy.

A memory system with policy means the agent knows what to do with a piece of information—whether to promote it, merge it with existing knowledge, flag it for review, or let it decay.

Three Defining Properties

1. Memories are reasoning objects, not blobs.

Each memory entry carries structured metadata that allows the agent to evaluate it, not just retrieve it:

{
  "id": "mem_20260210_001",
  "layer": "L2",
  "scope": { "type": "global" },
  "tags": ["workflow", "architecture"],
  "statement": "The team consistently prefers specification-first, document-driven development to enable multi-agent reproducibility.",
  "evidence": [
    { "type": "artifact", "ref": "docs/arch/", "note": "Observed across multiple project setups" }
  ],
  "confidence": 0.86,
  "stability": "semi-stable",
  "created_at": "2026-02-10",
  "updated_at": "2026-02-10",
  "review": {
    "cadence_days": 30,
    "next_review": "2026-03-12",
    "change_condition": "If 3+ independent projects deviate from this workflow, downgrade or deprecate"
  },
  "status": "active",
  "supersedes": [],
  "superseded_by": []
}

The agent can reason about confidence, stability, and change_condition. It can decide whether to promote or deprecate. This is qualitatively different from a cosine similarity score.

2. Memories have a lifecycle—they are not append-only.

A PI system includes operations for:

consolidate: merge duplicate or overlapping memories
invalidate: mark memories that have been contradicted
decay: lower confidence over time for unverified beliefs
promote: elevate a project-level pattern to a domain or global belief when sufficient evidence accumulates

3. The agent has memory maintenance responsibility.

After every significant task, the agent is expected to ask: Does what I just learned change anything I believed before? This reflection loop is what separates a PI system from a sophisticated logging system.

Core Concepts: The Three-Layer Memory Architecture

PI memory is organized into three layers with clearly different volatility profiles and governance rules.

L1 — Identity Memory (Rarely Changes)

This layer captures stable, global truths: preferences, principles, long-term constraints. Examples:

"The user prefers lightweight, file-based tooling over database-heavy stacks for personal projects."
"Security review is required before any feature that touches user authentication."

Governance:

Extremely small (target: under 40 records)
Requires 3+ independent evidence instances to promote anything here
Never auto-written; only a human or a designated curator agent can modify this layer
Loaded in full on every task start—no retrieval needed

L2 — Domain/Playbook Memory (Evolves Deliberately)

This is the productive core of PI. It captures how work gets done: architectural patterns, recurring workflows, domain knowledge, lessons learned across projects.

Examples:

"React frontend services in this monorepo use a consistent MVVM pattern with colocated test files."
"For geospatial features, PostGIS queries are preferred over in-memory filtering for datasets over 10k records."

Governance:

Capped (target: under 300 records)
Requires 2+ evidence instances and confidence ≥ 0.75 to be added
Written only through the patch/curator workflow (never directly)
Loaded selectively based on task tags and project scope

L3 — Project/Session Memory (Freely Written, Allowed to Decay)

Ephemeral project facts, recent decisions, active hypotheses. This is where the traditional vector store approach is appropriate—the noise tolerance is high and the data turnover is fast.

Governance:

Up to 2000 records per project
Can be auto-written
Decays after 45 days by default
Stored in SQLite or a local vector index; does not need to be in the primary JSONL files

Architecture: Four Agent Roles

A common failure mode in memory system design is giving a single agent too many responsibilities. The PI model separates memory handling across four distinct roles:

1. Capturer

Runs at the end of every task. Its job is purely extractive: pull out candidate facts, decisions, and patterns from the conversation or artifact and write them to .pi/inbox/captured.jsonl.

It does not judge quality or layer placement. It only captures.

2. Curator

Runs periodically (daily or weekly). Reads the inbox, evaluates candidates, assigns confidence scores and layer placement, and outputs a structured patch—never modifying the memory files directly.

{
  "patch_id": "patch_20260210_001",
  "ops": [
    {
      "op": "add",
      "target": ".pi/memory/L2.playbooks.jsonl",
      "record": { "...": "..." }
    },
    {
      "op": "supersede",
      "from": "mem_20250101_002",
      "to": "mem_20260210_001"
    },
    {
      "op": "deprecate",
      "id": "mem_20241212_007",
      "reason": "contradicted by recent architectural decisions"
    }
  ],
  "generated_by": "curator",
  "created_at": "2026-02-10"
}

3. Maintainer

Also runs periodically. Focuses on consolidation: merging duplicate entries, lowering confidence on unreviewed memories past their review date, and scheduling deprecations for beliefs that have been contradicted.

4. Retriever

Runs at the start of every task. Its output is a context package:

runtime/context.md: a human-readable summary of relevant active memories
runtime/selected_memory.json: structured memory objects for the downstream agent to use

The retriever always loads L1 in full, selects L2 records by tag and project scope, and pulls L3 records from the past 7–14 days based on project slug.

The execution agent (the one that actually writes code or does the work) only reads memory. It never writes it. This separation is a hard boundary.

Directory Structure

For a monorepo, the PI directory looks like this:

.pi/
  schemas/
    memory.schema.json
    patch.schema.json
  memory/
    L1.identity.jsonl
    L2.playbooks.jsonl
    projects/
      <project-slug>.jsonl
  index/
    l3.sqlite
    embeddings/
  inbox/
    captured.jsonl
  patches/
    patch_20260210_001.json
  reports/
    weekly_summary.md
  policies/
    memory_policy.md
  prompts/
    retriever.md
    curator.md
    maintainer.md
    capturer.md
  runtime/
    context.md
    selected_memory.json
  pi.config.yaml

Each project in the monorepo can declare its own memory scope:

# projects/alpha-service/.pi.project.yaml
project_slug: alpha-service
domains: ["backend", "api-design", "security"]
memory_scope:
  L2_tags: ["architecture", "workflow", "security"]
  L3_paths:
    - "docs/decisions/"
    - "src/alpha-service/"

This lets the retriever automatically know which L2 entries to include for a given task, without the developer having to explain the project context every time.

Configuration: Encoding Policy in Code

The pi.config.yaml file is where governance rules live. A working example:

version: 1
layers:
  L1:
    max_records: 40
    allowed_tags: ["preferences", "identity", "constraints"]
    write_policy: "manual_or_curator_only"
  L2:
    max_records: 300
    allowed_tags: ["workflow", "playbook", "domain", "architecture"]
    write_policy: "curator_or_maintainer_only"
  L3:
    max_records_per_project: 2000
    write_policy: "auto_ok"
 
confidence_rules:
  promote_to_L2:
    requires_evidence_count: 2
    min_confidence: 0.75
  promote_to_L1:
    requires_evidence_count: 3
    min_confidence: 0.85
  deprecate:
    contradiction_evidence_required: 1
 
review_policy:
  semi_stable_days: 30
  stable_days: 120
  decay:
    L3_after_days: 45
 
retrieval:
  always_load:
    - ".pi/memory/L1.identity.jsonl"
  tag_boost:
    workflow: 2.0
    architecture: 1.5
  time_window_days:
    L3_default: 14

This config is the single source of truth for how memories are promoted, reviewed, and expired. Changing it is an intentional governance act, not a side effect of a conversation.

Version Control Strategy: What to Git and What Not To

PI memory absolutely should be version-controlled—but selectively. The guiding principle:

Version-control what affects agent behavior. Do not version-control what can be rebuilt.

Must be in Git

pi.config.yaml — governs all agent behavior
policies/memory_policy.md — the constitutional rules
schemas/*.json — data contracts
prompts/*.md — agent prompt templates
memory/L1.identity.jsonl and memory/L2.playbooks.jsonl — the long-term truth
patches/*.json — the audit trail of every memory change
tools/pi_apply_patch.* — the patch application script
projects/*/.pi.project.yaml — project scope declarations

Must NOT be in Git

runtime/ — ephemeral context packages, rebuilt every task
inbox/ — raw unreviewed captures, may contain sensitive data
index/ — SQLite and embeddings, fully rebuildable
cache/ — transient data
reports/ — unless explicitly sanitized for long-term record

A recommended .gitignore addition:

# PI runtime & index (rebuildable)
.pi/runtime/
.pi/index/
.pi/cache/
.pi/inbox/
.pi/reports/

The patch-based workflow means that memory changes go through the same review process as code changes: a curator outputs a patch, the patch gets reviewed (optionally via pull request), and then pi_apply_patch applies it to the JSONL files. The git diff on L2.playbooks.jsonl is your audit trail.

Three Rules Against Memory Drift

As the system evolves, certain failure modes are predictable. Three hard constraints help prevent them:

1. L1/L2 entries must never rely solely on a single conversation as evidence. The evidence reference must point to a durable artifact (a document, a PR, a spec file) or multiple independent instances. This prevents one aberrant conversation from poisoning long-term memory.

2. Every preference or principle must include a change_condition. Without this, beliefs become undeniable axioms. The change condition makes the belief falsifiable: "If X happens 3 times in independent projects, this belief is invalidated." This is what keeps the system honest over time.

3. Duplicate statements are not allowed—only supersession. If a new belief replaces an old one, it must explicitly set supersedes: ["old-mem-id"] and mark the old record as superseded_by: ["new-mem-id"]. This maintains a traceable lineage instead of silent contradictions.

Implications for Long-Running AI Systems

The PI model has broader implications beyond the specific architecture described here.

Memory becomes an interface, not an implementation detail. When memory objects carry explicit metadata—confidence, stability, evidence, change conditions—the agent-memory relationship becomes auditable. You can inspect why the agent holds a particular belief and trace it back to specific events.

Agent cognition becomes a first-class engineering concern. The distinction between "what the agent knows" and "what it can infer" becomes meaningful and tractable. The review cadences and promotion thresholds are engineering decisions with observable consequences.

The gap between human and AI long-term collaboration narrows. A system where the AI can say "Based on 4 projects, I've observed that X; I'm updating my understanding accordingly, and I'm flagging Y as potentially outdated" is qualitatively different from a system that just retrieves similar past conversations. It starts to resemble how a skilled colleague builds institutional knowledge.

Conclusion

The central insight of PI-style memory design is that storing information is the easy part. The hard part—and the part that matters for long-term usefulness—is deciding what information is worth keeping, when it should change, and how it should influence future behavior.

By treating memories as first-class reasoning objects with lifecycle metadata, separating memory maintenance into distinct agent roles, and applying software engineering discipline (schemas, patches, version control, review processes) to the memory layer itself, you get a system that can actually evolve with your work rather than accumulating noise around it.

The MVP is simpler than the full vision: start with JSONL files for L1/L2, a basic retriever that filters by tags and project slug, and a weekly manual review session. The infrastructure—SQLite indices, embedding search, automated curator agents—can be layered in as the core workflow proves its value.

What you get from the start is something more important: a memory system where the agent's beliefs are traceable, auditable, and falsifiable. That's the foundation for AI systems that remain trustworthy over time.