Building a Context Usage Heatmap for AI Coding Agents

11 min read, 2121 words, last updated: 2026/1/25

Introduction

When you work with an AI coding agent like Claude Code over days and weeks, you accumulate a growing ecosystem of documents, configuration files, skill definitions, and prompt templates. These files shape every response the agent produces — but until something goes wrong, you rarely stop to ask: which of these files is actually being used? How often? In what order?

This post explores the design and feasibility of a context usage heatmap — a real-time observability tool that tracks which documents and configuration sections are injected into an AI agent's context window, visualizing their usage frequency and order during a working session.

The idea sounds ambitious, but as we'll see, the core insight is surprisingly grounding: you don't need to peek inside the model's reasoning. You only need to watch what you feed it.

Background and Context

The Problem with AI Agent Observability

Traditional software observability tools — logs, traces, metrics — have well-understood instrumentation points. You know when a function is called, what arguments it received, how long it took. The data is precise and deterministic.

AI agents introduce a fundamentally different dynamic. The "execution" is largely opaque. You send a prompt; you get a response. What happened in between is unknowable from the outside. This makes classic APM (Application Performance Monitoring) approaches inapplicable at the model layer.

However, there's an underexplored middle layer that is observable: context construction.

Before an LLM ever processes a single token, a pipeline of decisions has already been made:

Which documents to include
Which sections to extract
Which tools and skills to make available
In what order to assemble these pieces

This pipeline is entirely within your control. And that makes it entirely observable.

Why This Matters for AI Coding Agents

In a mature AI coding workflow, a single task might draw from:

Architecture specification documents
Coding style guides
Domain-specific skill definitions
System prompt configurations
Past context or session notes

Over time, this corpus grows. Some files become critical load-bearing pieces referenced in almost every task. Others become stale and are never included. Without visibility, you have no way to tell them apart.

The practical consequences are real:

You can't answer "why did the agent behave differently today?" without knowing what it was given
You can't identify unused or redundant documentation cluttering your workspace
You can't reason about prompt bloat or context window efficiency

A context usage heatmap directly addresses all three.

Core Concepts

The Fundamental Boundary: Input Visibility vs. Internal Attention

The most important conceptual foundation for this system is understanding what you can and cannot observe.

What you cannot observe:

Token-level attention weights
Which part of the context the model "focused on" when generating a response
Internal reasoning chains
Whether a specific sentence influenced the output

These are properties of the model's internal computation. No API, no CLI hook, no clever prompt engineering will expose them. This is not a gap in current tooling — it is a structural property of how large language models work.

What you can always observe:

Which documents were selected for inclusion
Which sections (headers, chunks) were extracted from those documents
The timestamp and order of each inclusion
Which skills and tools were made available
The human or automated decision that triggered each inclusion

This second list is entirely deterministic. You constructed the context — you have complete provenance over every piece of information that entered it.

"Used" Means "Injected"

This reframing is the conceptual keystone of the entire system.

In a traditional observability context, "used" means "executed" — a function was called, a database row was read. In this system, the operational definition must shift:

"A document section was used" means "it was injected into the context window."

This is not a compromise or an approximation. It is the most honest and defensible definition available. It's also the one that makes engineering tractable.

Once you accept this definition, the system design becomes straightforward. You're building a context construction tracer, not a model internals debugger.

Why Markdown Headers Are the Right Granularity

Documents are the atoms of knowledge in an AI coding workflow, but they're too coarse for useful heatmap visualization. A 300-line specification file that gets included as a single unit tells you very little about which part of the spec was relevant.

Markdown headers provide a natural intermediate granularity:

They correspond to human-authored semantic boundaries
They're already present in your documents without any instrumentation
They map cleanly to the mental model developers use when writing and reading specs

For a single-session analysis (no cross-session comparison required), header-level tracking is sufficient and optimal. You can answer questions like "the ## Error Handling section of the API spec was injected 14 times this session" — which is genuinely actionable.

For longer-term tracking where documents evolve, header names are unstable identifiers (a renamed section breaks historical continuity). But for session-scoped analysis, this instability is irrelevant.

System Design and Analysis

The Event Model

The core of the system is a simple event stream. Every time a document section enters the context, you emit a structured event:

{
  "ts": 1706266353000,
  "session_id": "session-abc",
  "event": "context_add",
  "doc": "architecture-spec.md",
  "header": "## Data Flow Design",
  "depth": 2,
  "triggered_by": "skill:analyze-dependencies"
}

This schema captures:

When it happened (ts)
What was included (doc + header)
Why it was included (triggered_by)
Session scope for aggregation (session_id)

The event log is append-only and requires no persistent database for session-scoped analysis. An in-memory array or a local flat file is sufficient.

Aggregation Layer

From the event stream, two primary aggregations drive the heatmap:

Frequency map: Count of injections per {doc, header} pair within the session window. This answers "what was referenced most?"

Timeline: The ordered sequence of all injection events. This answers "what was the pattern of context construction over time?"

Both are trivially computed from the event log with no external dependencies.

Real-Time UI Architecture

The proposed UI consists of three panels:

Document sidebar — A collapsible tree of all workspace documents and their header structure. Each header node renders with a visual heat indicator (color saturation or a count badge) reflecting its injection frequency in the current session.

Document preview — A rendered view of the currently selected document with active sections highlighted. Sections currently in the context window appear with a distinct highlight; sections that have been injected at least once this session appear with a lighter background.

Timeline panel — A chronological log of context construction events, similar to a browser network waterfall. Each row shows the document, section, timestamp, and triggering source.

Real-time updates flow from the plugin backend to the UI via WebSocket or Server-Sent Events. When a new injection event fires, the UI updates the relevant document's header heat value and appends a row to the timeline.

Plugin Architecture for Claude Code

In a Claude Code plugin context, the implementation follows this shape:

Plugin startup: The plugin process launches a local HTTP server (e.g., on localhost:5177) and opens a browser tab pointing to the UI.
Document scan: On startup, the plugin traverses the workspace and parses all Markdown files, building an in-memory document tree indexed by {path, headers}.
Hook integration: The plugin registers hooks at context construction points — wherever the system decides to include a document or skill in the prompt. Each hook fires the event emission.
Event broadcast: The local server maintains a WebSocket connection to the browser tab. Every event is broadcast immediately.
Aggregation: The UI maintains the frequency map and timeline in local state, updating incrementally on each received event.

This architecture requires no external services, no databases, and no network connectivity beyond localhost. The entire system is ephemeral — session state lives in memory and browser state, and is discarded when the session ends.

Existing Landscape

Several commercial platforms have emerged in the LLM observability space — tools like Langfuse, Helicone, and various vendor-specific trace platforms. These systems generally focus on:

API call-level tracing (prompt in, response out, latency, token cost)
Multi-agent execution graphs
Production monitoring dashboards

What they don't address is the document-centric, local, real-time visualization of context construction in a single developer session. The gap is most visible in AI coding workflows where the "knowledge base" (docs, specs, skills) is a first-class engineering artifact rather than a RAG index.

The closest analogy in traditional software tooling is a code coverage visualizer — the kind that highlights which lines were executed during a test run. But applied to prompts and documents rather than source code, and operating in real time rather than post-hoc.

This framing suggests both the product positioning and the UX direction: developers already know how to read coverage visualizations. A context heatmap is the same mental model applied to a new domain.

Implications and Best Practices

Design the Context Construction Layer First

The heatmap is only as good as the observability hooks you place in the context construction pipeline. If your current workflow injects documents as opaque blobs without any structured metadata, the heatmap will have nothing to display.

The prerequisite is a context construction layer that:

Makes inclusion decisions explicitly (not implicitly via string concatenation)
Associates each included piece with its source document and header
Emits events at inclusion time

If you're building or refactoring a Claude Code plugin that manages context, this is the right time to design in observability from the start.

Communicate What the Heatmap Actually Shows

Calling this tool an "AI attention heatmap" would be misleading. The accurate framing is:

"Context injection heatmap — visualizing which documents and sections were provided to the model, when, and how often."

This framing is honest, defensible, and still compelling. Users of the tool should understand that "hot" sections are sections that were frequently offered to the model, not sections that the model definitively relied upon. The distinction matters when drawing conclusions about document importance.

Chunk Granularity Is a Tunable Parameter

For most use cases, Markdown headers provide the right granularity. But there are scenarios where finer granularity adds value:

Long sections (>500 words) where only a subsection is typically relevant
Documents with irregular or sparse header structure
Technical reference docs where individual entries (API endpoints, configuration keys) are the natural unit

In these cases, introducing explicit chunk IDs — as HTML comments or in-line annotations — enables finer-grained tracking without restructuring the document.

<!-- chunk-id: api-auth-bearer -->
### Bearer Token Authentication
 
Requests must include an `Authorization: Bearer <token>` header...

This annotation is invisible to readers and requires minimal tooling to parse, but enables exact chunk-level frequency tracking when needed.

Session Scoping vs. Persistent Analytics

For session-scoped analysis, in-memory state is sufficient and appropriate. There is no reason to persist injection events between sessions unless you want cross-session comparison.

If you eventually want to answer questions like "has the ## Security Considerations section become less referenced since we refactored the auth module three weeks ago?" — that's when you need persistent storage and stable identifiers. But that's a later problem. Start with the session-scoped version and add persistence only when you have a concrete use case for it.

Conclusion

The central insight of this design is deceptively simple: you can't observe what a model thinks, but you can observe what you give it to think about.

Building a context usage heatmap for an AI coding agent is a tractable, well-scoped engineering problem once you commit to this framing. The system observes context construction — a process entirely within the developer's control — and visualizes it in a way that makes the "knowledge loading" behavior of an AI session legible and debuggable.

The technical implementation is straightforward: an event stream from context construction hooks, lightweight in-memory aggregation, and a real-time UI update loop over WebSocket. No external services, no model API changes, no attention weight extraction.

What you get is something genuinely useful: the ability to look at a session and say "the error handling spec was loaded 20 times, the deployment guide was loaded once, and the database schema was never loaded" — and make informed decisions about your workspace, your prompts, and your skills accordingly.

In a domain where most observability tooling is still focused on production API metrics, building developer-facing, session-scoped context visualization is an underserved and high-value direction. The gap between "I sent a prompt and got a response" and "I understand exactly what my agent was working with" is one worth closing.