Claude Code Memory System: Five Layers from Injection to Consolidation — Blog

Most agent memory implementations are a single text file appended to the system prompt.

Claude Code has five distinct layers, each solving a different problem in the memory lifecycle.

File structure

Files covered in this post6 files

src/
├── services/
│   ├── autoDream/
│   │   └── autoDream.ts
│   ├── extractMemories/
│   │   └── extractMemories.ts
│   ├── relevantMemories/
│   │   └── relevantMemories.ts
│   ├── SessionMemory/
│   │   └── SessionMemory.ts
├── tasks/
│   ├── DreamTask/
│   │   └── DreamTask.ts
└── utils/
    ├── memdir.ts

src/services/SessionMemory/SessionMemory.ts

In-context compaction trigger

High

Monitors conversation token count and tool call depth. When either threshold is crossed (10k tokens or 3+ tool calls), fires a post-sampling hook that compacts the running context into a structured memory block.

Module: Memory System

Key Exports

SessionMemory
shouldCompact
compactSession

Why It Matters

Compaction is triggered by two independent thresholds — token count and tool depth — because either can cause context overflow.
The hook fires after sampling, not before, so the model sees the full turn before compaction begins.
Output is a structured block inserted back into the conversation, not a side file.

src/services/extractMemories/extractMemories.ts

Stashing coalescer and taxonomy engine

High

After each turn, extracts memory candidates from the conversation using a 4-type taxonomy (fact, preference, skill, relationship). A Sonnet-assisted selection step filters candidates before writing to memdir.

Module: Memory System

Key Exports

extractMemories
MemoryType
coalesceStash

Why It Matters

The taxonomy is not cosmetic — it shapes retrieval. Facts and skills are retrieved differently.
The coalescer deduplicates before writing, preventing memdir from accumulating stale variants of the same fact.
Sonnet-assisted selection means the model curates its own memory — the quality of extraction depends on the selection prompt.

src/services/relevantMemories/relevantMemories.ts

Contextual memory selection via side-query

High

Before each turn, fires a lightweight Sonnet side-query that selects at most 5 memory files from memdir that are contextually relevant to the current conversation. Only selected files are injected into context — this keeps the memory footprint bounded even as memdir grows.

Module: Memory System

Key Exports

getRelevantMemories
MemorySelectionPrompt

Why It Matters

The ≤5 limit is a token budget decision: more than 5 files starts to crowd the context window.
The selection query runs against the full memdir listing — it reads filenames and metadata, not file contents, to stay fast.
This layer is what makes a large memdir scale: you pay only for what is relevant to the current turn.

src/services/autoDream/autoDream.ts

Background memory consolidation with 5-gate access control

High

Runs periodic consolidation of memdir contents into a compressed long-term memory file. Protected by 5 gates: enabled flag, 24h interval, 10-minute throttle, 5-session minimum, and an mtime-based distributed lock.

Module: Memory System

Key Exports

maybeRunAutoDream
AutoDreamGate
acquireDreamLock

Why It Matters

The mtime-based lock prevents two Claude Code instances from dreaming simultaneously in shared environments.
The 5-session minimum ensures consolidation happens over real usage, not a single long session.
AutoDream output is written back into memdir, not a separate file — the same retrieval path reads consolidated memory.

src/tasks/DreamTask/DreamTask.ts

UI surface for AutoDream

Medium

Wraps AutoDream execution as a Task, making consolidation observable in the session UI. Exposes progress state and cancellation so the user can see and stop long-running consolidation.

Module: Memory System

Key Exports

DreamTask

Why It Matters

DreamTask is the reason AutoDream appears as a named task in the UI rather than as hidden background work.
This is the same pattern as other background work: promote to Task, make it observable, add cancellation.

The five-layer model

Memory lifecycle in Claude Code

From short-term prompt injection to long-term dream consolidation.

1
Layer 1: memdir — prompt injection
utils/memdir.ts
Files in ~/.claude/memories/ are read at session start and injected into the system prompt. This is the base retrieval layer — fast, always-on, bounded by what fits in context.
2
Layer 2: Relevant Memories — contextual selection
services/relevantMemories/relevantMemories.ts
Before each turn, a lightweight Sonnet side-query selects at most 5 files from memdir that are contextually appropriate. Only those files are injected. This keeps the memory footprint bounded as memdir grows across sessions.
3
Layer 3: SessionMemory — in-context compaction
services/SessionMemory/SessionMemory.ts
During a long turn, when token count exceeds 10k or tool calls exceed 3, SessionMemory fires a post-sampling hook to compact the running context into a structured block.
4
Layer 4: ExtractMemories — taxonomy and stashing
services/extractMemories/extractMemories.ts
After each turn, candidates are extracted with a 4-type taxonomy (fact, preference, skill, relationship), filtered by Sonnet-assisted selection, and coalesced before writing to memdir.
5
Layer 5: AutoDream — background consolidation
services/autoDream/autoDream.ts / tasks/DreamTask/
Periodically, AutoDream compresses memdir contents into a denser long-term memory block. Five gates control access: enabled flag, 24h interval, 10-min throttle, 5-session minimum, mtime-based distributed lock.

Why five layers instead of one

A single-layer memory design hits predictable failure modes:

inject everything: context overflows on long sessions
extract and store: stale variants accumulate without compaction
compact only: cross-session facts are lost between restarts
no selection: a large memdir crowds context even when most of it is irrelevant
no consolidation: retrieval quality degrades as memdir grows

Each layer in Claude Code addresses one of those failures.

The retrieval layer (memdir) is bounded. The selection layer (Relevant Memories) filters to what matters for the current turn. The compaction layer (SessionMemory) manages in-flight overflow. The extraction layer (ExtractMemories) populates the store with curated facts. The consolidation layer (AutoDream) keeps the store from degrading over time.

The distributed lock design

The mtime-based lock in AutoDream is worth examining separately.

The problem: if a user runs multiple Claude Code instances in the same environment — terminals, IDE integration, background sessions — each instance will independently check whether it is time to dream.

Without a lock, two instances would consolidate simultaneously, producing conflicting writes to memdir.

The solution is an mtime-based file lock:

check if the lock file exists and its mtime is within the throttle window
if not, write a new lock file (updating its mtime atomically)
proceed with consolidation
release by writing a completion marker

This is not a POSIX advisory lock. It is a time-window-based coordination primitive that works across processes without shared memory.

It is the right design for an agent runtime that runs concurrently without an always-on coordinator.

Sonnet-assisted extraction is a product decision

The choice to use Sonnet for memory selection inside extractMemories.ts deserves attention.

It means extraction quality is not just a function of what happened in the conversation — it is also a function of how well the selection prompt is written.

That is both a strength (the model can recognize nuanced facts that a regex would miss) and a risk (a poor selection prompt causes high-value facts to be filtered out before they reach memdir).

Claude Code controls this risk by using a specific extraction prompt tuned to the 4-type taxonomy. Changing the taxonomy means changing the extraction prompt.

Memory system pattern across languages

The core pattern: extract, coalesce, gate-check, write. JS shows the actual shape from learning-claude-code; Python and Go show the same structure.

typescriptservices/extractMemories (simplified)

Taxonomy-driven extraction with coalescer — actual pattern from learning-claude-code

type MemoryType = 'fact' | 'preference' | 'skill' | 'relationship'

interface MemoryCandidate {
type: MemoryType
content: string
confidence: number
sourceConversationId: string
}

async function extractMemories(
conversation: Message[],
existingMemories: MemoryCandidate[]
): Promise<void> {
// 1. Extract candidates using taxonomy-aware prompt
const candidates = await runExtractionPrompt(conversation)

// 2. Coalesce: deduplicate against existing memories
const novel = coalesceStash(candidates, existingMemories)

// 3. Sonnet-assisted selection: filter to high-value subset
const selected = await runSelectionPrompt(novel)

// 4. Write to memdir — each fact as its own file
for (const memory of selected) {
  await writeMemoryFile(memory)
}
}

// Coalescer: prevent stale variants accumulating
function coalesceStash(
candidates: MemoryCandidate[],
existing: MemoryCandidate[]
): MemoryCandidate[] {
return candidates.filter(c =>
  !existing.some(e =>
    e.type === c.type && semanticallySimilar(e.content, c.content)
  )
)
}

Claude Code's AutoDream uses an mtime-based file lock instead of a POSIX advisory lock. What problem does this solve that a standard file lock cannot?

medium

Multiple Claude Code instances may run simultaneously in the same environment (terminal + IDE integration + background sessions), all sharing the same memdir. AutoDream consolidates memdir contents into long-term memory.

Amtime locks are faster than POSIX locks for large files
Incorrect.Performance is not the reason. POSIX advisory locks are fast. The issue is cross-process coordination semantics, not speed.
BPOSIX advisory locks do not survive process crashes, so a crashed Claude Code instance would leave memdir locked permanently
Incorrect.Actually, POSIX advisory locks ARE released on process crash — the kernel cleans them up. This is the opposite of the real concern.
CThe mtime-based lock creates a time-window coordination primitive that works across processes sharing a filesystem without requiring a lock daemon or shared memory
Correct!Correct. POSIX advisory locks require the processes to cooperate via the same kernel lock table, which only works within one machine and requires the holding process to be alive. An mtime-based lock works across any processes that share a filesystem — including remote mounts — and is self-expiring: if a process crashes mid-dream, the lock expires after the throttle window and another instance can proceed.
Dmtime locks prevent the memdir from being corrupted if two instances write simultaneously
Incorrect.Write corruption is a separate concern from coordination. The mtime lock prevents two instances from both deciding to consolidate at the same time — but it does not protect individual file writes. The temp+rename pattern handles atomic writes.

File structure

In-context compaction trigger

Stashing coalescer and taxonomy engine

Contextual memory selection via side-query

Background memory consolidation with 5-gate access control

UI surface for AutoDream

The five-layer model

Memory lifecycle in Claude Code

Layer 1: memdir — prompt injection

Layer 2: Relevant Memories — contextual selection

Layer 3: SessionMemory — in-context compaction

Layer 4: ExtractMemories — taxonomy and stashing

Layer 5: AutoDream — background consolidation

Why five layers instead of one

The distributed lock design

Sonnet-assisted extraction is a product decision

Claude Code's AutoDream uses an mtime-based file lock instead of a POSIX advisory lock. What problem does this solve that a standard file lock cannot?