Claude Code Memory System: Five Layers from Injection to Consolidation
How memdir injection, SessionMemory compaction, ExtractMemories taxonomy, Relevant Memories side-queries, and AutoDream consolidation form a complete agent memory architecture.
Most agent memory implementations are a single text file appended to the system prompt.
Claude Code has five distinct layers, each solving a different problem in the memory lifecycle.
File structure
src/
├── services/
│ ├── autoDream/
│ │ └── autoDream.ts
│ ├── extractMemories/
│ │ └── extractMemories.ts
│ ├── relevantMemories/
│ │ └── relevantMemories.ts
│ ├── SessionMemory/
│ │ └── SessionMemory.ts
├── tasks/
│ ├── DreamTask/
│ │ └── DreamTask.ts
└── utils/
├── memdir.tssrc/services/SessionMemory/SessionMemory.ts
In-context compaction trigger
Monitors conversation token count and tool call depth. When either threshold is crossed (10k tokens or 3+ tool calls), fires a post-sampling hook that compacts the running context into a structured memory block.
Key Exports
- SessionMemory
- shouldCompact
- compactSession
Why It Matters
- Compaction is triggered by two independent thresholds — token count and tool depth — because either can cause context overflow.
- The hook fires after sampling, not before, so the model sees the full turn before compaction begins.
- Output is a structured block inserted back into the conversation, not a side file.
src/services/extractMemories/extractMemories.ts
Stashing coalescer and taxonomy engine
After each turn, extracts memory candidates from the conversation using a 4-type taxonomy (fact, preference, skill, relationship). A Sonnet-assisted selection step filters candidates before writing to memdir.
Key Exports
- extractMemories
- MemoryType
- coalesceStash
Why It Matters
- The taxonomy is not cosmetic — it shapes retrieval. Facts and skills are retrieved differently.
- The coalescer deduplicates before writing, preventing memdir from accumulating stale variants of the same fact.
- Sonnet-assisted selection means the model curates its own memory — the quality of extraction depends on the selection prompt.
src/services/relevantMemories/relevantMemories.ts
Contextual memory selection via side-query
Before each turn, fires a lightweight Sonnet side-query that selects at most 5 memory files from memdir that are contextually relevant to the current conversation. Only selected files are injected into context — this keeps the memory footprint bounded even as memdir grows.
Key Exports
- getRelevantMemories
- MemorySelectionPrompt
Why It Matters
- The ≤5 limit is a token budget decision: more than 5 files starts to crowd the context window.
- The selection query runs against the full memdir listing — it reads filenames and metadata, not file contents, to stay fast.
- This layer is what makes a large memdir scale: you pay only for what is relevant to the current turn.
src/services/autoDream/autoDream.ts
Background memory consolidation with 5-gate access control
Runs periodic consolidation of memdir contents into a compressed long-term memory file. Protected by 5 gates: enabled flag, 24h interval, 10-minute throttle, 5-session minimum, and an mtime-based distributed lock.
Key Exports
- maybeRunAutoDream
- AutoDreamGate
- acquireDreamLock
Why It Matters
- The mtime-based lock prevents two Claude Code instances from dreaming simultaneously in shared environments.
- The 5-session minimum ensures consolidation happens over real usage, not a single long session.
- AutoDream output is written back into memdir, not a separate file — the same retrieval path reads consolidated memory.
src/tasks/DreamTask/DreamTask.ts
UI surface for AutoDream
Wraps AutoDream execution as a Task, making consolidation observable in the session UI. Exposes progress state and cancellation so the user can see and stop long-running consolidation.
Key Exports
- DreamTask
Why It Matters
- DreamTask is the reason AutoDream appears as a named task in the UI rather than as hidden background work.
- This is the same pattern as other background work: promote to Task, make it observable, add cancellation.
The five-layer model
Memory lifecycle in Claude Code
From short-term prompt injection to long-term dream consolidation.
- 1
Layer 1: memdir — prompt injection
utils/memdir.tsFiles in ~/.claude/memories/ are read at session start and injected into the system prompt. This is the base retrieval layer — fast, always-on, bounded by what fits in context.
- 2
Layer 2: Relevant Memories — contextual selection
services/relevantMemories/relevantMemories.tsBefore each turn, a lightweight Sonnet side-query selects at most 5 files from memdir that are contextually appropriate. Only those files are injected. This keeps the memory footprint bounded as memdir grows across sessions.
- 3
Layer 3: SessionMemory — in-context compaction
services/SessionMemory/SessionMemory.tsDuring a long turn, when token count exceeds 10k or tool calls exceed 3, SessionMemory fires a post-sampling hook to compact the running context into a structured block.
- 4
Layer 4: ExtractMemories — taxonomy and stashing
services/extractMemories/extractMemories.tsAfter each turn, candidates are extracted with a 4-type taxonomy (fact, preference, skill, relationship), filtered by Sonnet-assisted selection, and coalesced before writing to memdir.
- 5
Layer 5: AutoDream — background consolidation
services/autoDream/autoDream.ts / tasks/DreamTask/Periodically, AutoDream compresses memdir contents into a denser long-term memory block. Five gates control access: enabled flag, 24h interval, 10-min throttle, 5-session minimum, mtime-based distributed lock.
Why five layers instead of one
A single-layer memory design hits predictable failure modes:
- inject everything: context overflows on long sessions
- extract and store: stale variants accumulate without compaction
- compact only: cross-session facts are lost between restarts
- no selection: a large memdir crowds context even when most of it is irrelevant
- no consolidation: retrieval quality degrades as memdir grows
Each layer in Claude Code addresses one of those failures.
The retrieval layer (memdir) is bounded. The selection layer (Relevant Memories) filters to what matters for the current turn. The compaction layer (SessionMemory) manages in-flight overflow. The extraction layer (ExtractMemories) populates the store with curated facts. The consolidation layer (AutoDream) keeps the store from degrading over time.
The distributed lock design
The mtime-based lock in AutoDream is worth examining separately.
The problem: if a user runs multiple Claude Code instances in the same environment — terminals, IDE integration, background sessions — each instance will independently check whether it is time to dream.
Without a lock, two instances would consolidate simultaneously, producing conflicting writes to memdir.
The solution is an mtime-based file lock:
- check if the lock file exists and its mtime is within the throttle window
- if not, write a new lock file (updating its mtime atomically)
- proceed with consolidation
- release by writing a completion marker
This is not a POSIX advisory lock. It is a time-window-based coordination primitive that works across processes without shared memory.
It is the right design for an agent runtime that runs concurrently without an always-on coordinator.
Sonnet-assisted extraction is a product decision
The choice to use Sonnet for memory selection inside extractMemories.ts deserves attention.
It means extraction quality is not just a function of what happened in the conversation — it is also a function of how well the selection prompt is written.
That is both a strength (the model can recognize nuanced facts that a regex would miss) and a risk (a poor selection prompt causes high-value facts to be filtered out before they reach memdir).
Claude Code controls this risk by using a specific extraction prompt tuned to the 4-type taxonomy. Changing the taxonomy means changing the extraction prompt.
Memory system pattern across languages
The core pattern: extract, coalesce, gate-check, write. JS shows the actual shape from learning-claude-code; Python and Go show the same structure.
Taxonomy-driven extraction with coalescer — actual pattern from learning-claude-code
type MemoryType = 'fact' | 'preference' | 'skill' | 'relationship'
interface MemoryCandidate {
type: MemoryType
content: string
confidence: number
sourceConversationId: string
}
async function extractMemories(
conversation: Message[],
existingMemories: MemoryCandidate[]
): Promise<void> {
// 1. Extract candidates using taxonomy-aware prompt
const candidates = await runExtractionPrompt(conversation)
// 2. Coalesce: deduplicate against existing memories
const novel = coalesceStash(candidates, existingMemories)
// 3. Sonnet-assisted selection: filter to high-value subset
const selected = await runSelectionPrompt(novel)
// 4. Write to memdir — each fact as its own file
for (const memory of selected) {
await writeMemoryFile(memory)
}
}
// Coalescer: prevent stale variants accumulating
function coalesceStash(
candidates: MemoryCandidate[],
existing: MemoryCandidate[]
): MemoryCandidate[] {
return candidates.filter(c =>
!existing.some(e =>
e.type === c.type && semanticallySimilar(e.content, c.content)
)
)
}Claude Code's AutoDream uses an mtime-based file lock instead of a POSIX advisory lock. What problem does this solve that a standard file lock cannot?
mediumMultiple Claude Code instances may run simultaneously in the same environment (terminal + IDE integration + background sessions), all sharing the same memdir. AutoDream consolidates memdir contents into long-term memory.
Amtime locks are faster than POSIX locks for large files
Incorrect.Performance is not the reason. POSIX advisory locks are fast. The issue is cross-process coordination semantics, not speed.BPOSIX advisory locks do not survive process crashes, so a crashed Claude Code instance would leave memdir locked permanently
Incorrect.Actually, POSIX advisory locks ARE released on process crash — the kernel cleans them up. This is the opposite of the real concern.CThe mtime-based lock creates a time-window coordination primitive that works across processes sharing a filesystem without requiring a lock daemon or shared memory
Correct!Correct. POSIX advisory locks require the processes to cooperate via the same kernel lock table, which only works within one machine and requires the holding process to be alive. An mtime-based lock works across any processes that share a filesystem — including remote mounts — and is self-expiring: if a process crashes mid-dream, the lock expires after the throttle window and another instance can proceed.Dmtime locks prevent the memdir from being corrupted if two instances write simultaneously
Incorrect.Write corruption is a separate concern from coordination. The mtime lock prevents two instances from both deciding to consolidate at the same time — but it does not protect individual file writes. The temp+rename pattern handles atomic writes.