Claude Code Query Loop: The Recoverable Turn Engine — Blog

If the boot pipeline explains how Claude Code starts, query.ts explains what it actually is.

This file is the core runtime loop — not a chat handler, not an API wrapper. A governed, recoverable state machine that owns everything from context preparation to tool execution to failure recovery. Every agent turn flows through this file, and the design choices here determine how the system behaves under pressure.

File structure

Files covered in this post7 files

src/
├── query.ts
├── query/
│   ├── config.ts
│   ├── tokenBudget.ts
│   └── stopHooks.ts
├── services/
│   ├── compact/
│   │   ├── autoCompact.ts
│   │   ├── microCompact.ts
│   │   └── contextCollapse.ts

src/query.ts

Recoverable streaming turn loop

Critical

Coordinates the main turn lifecycle: prepare context, stream model output, execute tools, manage compaction, handle fallback and recovery, and decide whether another loop iteration is required.

Module: Core Agent Runtime

Key Exports

query
QueryParams
State
Terminal
QueryDeps

Why It Matters

Claude Code treats a turn as a state machine, not a single API request.
Context management is part of the core loop, not an outer wrapper.
Error recovery is a first-class runtime contract — not a last-resort catch block.

Related Files

Query configsrc/query/config.ts
Token budgetsrc/query/tokenBudget.ts
Stop hookssrc/query/stopHooks.ts
Auto compactsrc/services/compact/autoCompact.ts
Micro compactsrc/services/compact/microCompact.ts

What a turn actually is

The usual description of an agent loop — send prompt, get tool call, run tool, send result back — fits on a slide but doesn't explain any of the real behavior.

A turn in query.ts is a while(true) loop that carries mutable cross-iteration state. That loop only exits when a Terminal value is returned. Every iteration does these things, in this order:

One Claude Code turn — the real sequence

Each iteration of queryLoop prepares a context window, calls the model, streams the response, executes tools, then decides whether to continue or return Terminal.

1
Snapshot immutable config once, destructure mutable state at top of each iteration
State / buildQueryConfig()
Config (feature flags, env, session snapshot) is built once at loop entry. State — messages, toolUseContext, compaction tracking — is destructured fresh each iteration so reassignment at continue sites is clean.
This separation between frozen config and mutable state is architectural. It means feature() gates are evaluated once and the result is stable for the entire turn, while state can be mutated between iterations without risk of stale reads.
2
Apply the five-step message preprocessing pipeline
applyToolResultBudget / microcompact / autocompact
Before the model sees anything: tool result budget enforcement, history snip, microcompact, context collapse, autocompact. These run in a fixed order because they interact — collapse before autocompact means you may avoid a full compact.
The pipeline is load-bearing in its ordering. Changing the sequence breaks the composed behavior.
3
Build the full system prompt and stream the assistant response
deps.callModel / StreamingToolExecutor
System context is appended, attachments are injected, and the model is called via deps.callModel. Assistant output is consumed incrementally — tool_use blocks are detected during streaming.
4
Execute tools in parallel and capture results
StreamingToolExecutor / runTools
Tool results are normalized back into the conversation. The StreamingToolExecutor can run tools concurrently as their inputs complete during the stream, rather than waiting for the whole response.
5
Run stop hooks and check seven recovery conditions
handleStopHooks / checkTokenBudget / maxOutputTokensRecovery
After tools finish: stop hooks fire, token budget is checked, max_output_tokens recovery is attempted, prompt-too-long recovery is attempted. Any of these can trigger another loop iteration.
6
Return Terminal or continue
Terminal / Continue transitions
If no recovery condition fired and the model returned end_turn, the loop exits with a Terminal value. Otherwise state is updated and the loop continues.

The State type is the real design document

The mutable loop state is a single object:

type State = {
  messages: Message[]
  toolUseContext: ToolUseContext
  autoCompactTracking: AutoCompactTrackingState | undefined
  maxOutputTokensRecoveryCount: number
  hasAttemptedReactiveCompact: boolean
  maxOutputTokensOverride: number | undefined
  pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
  stopHookActive: boolean | undefined
  turnCount: number
  transition: Continue | undefined
}

Every field in State documents a failure mode the runtime had to address:

maxOutputTokensRecoveryCount — models truncate output at their limit. The loop retries with a lower override, up to a maximum attempt count.
hasAttemptedReactiveCompact — reactive compaction (triggered by prompt_too_long errors) is gated to one attempt per turn.
autoCompactTracking — tracks whether proactive compact has fired this turn and how many consecutive failures occurred. The circuit breaker lives here.
stopHookActive — hooks can pause continuation even after tool results arrive.
transition — the reason the previous iteration continued. Absent on the first iteration. Used by tests to assert recovery paths fired without inspecting message content.

If you want to understand what this runtime is trying to survive, read State.

QueryParams and the dependency injection seam

export type QueryParams = {
  messages: Message[]
  systemPrompt: SystemPrompt
  userContext: { [k: string]: string }
  systemContext: { [k: string]: string }
  canUseTool: CanUseToolFn
  toolUseContext: ToolUseContext
  fallbackModel?: string
  querySource: QuerySource
  maxOutputTokensOverride?: number
  maxTurns?: number
  skipCacheWrite?: boolean
  taskBudget?: { total: number }
  deps?: QueryDeps
}

The deps field is the test seam. QueryDeps has exactly four method signatures:

type QueryDeps = {
  callModel: (params: ModelCallParams) => AsyncGenerator<StreamEvent>
  compact: (messages: Message[], opts: CompactOpts) => Promise<Message[]>
  uuid: () => string
  now: () => number
}

Four methods is not an accident. In production, productionDeps() provides real implementations. In tests, injected deps let you control uuid generation, model calls, compaction behavior, and time — without mocking at the module level. The surface is small enough to fake completely, and broad enough to cover every external dependency the loop touches.

querySource identifies who called query() — repl_main_thread, agent:some-id, print_mode, etc. Several behaviors are conditioned on this: whether tool result replacements are persisted, whether headless profiler checkpoints fire, and which analytics events are emitted.

The five-step message preprocessing pipeline

Before every API call, the message array passes through a fixed pipeline:

Step 1 — Tool result budget enforcement. Individual tool results that exceed their size limit are replaced with a truncated summary. This runs first so oversized results don't inflate the context that subsequent steps operate on.

Step 2 — History snip. Oldest non-essential messages are removed when the total approaches the context window ceiling. Snip is conservative: it removes assistant turns before user turns, and never removes the most recent user message.

Step 3 — Microcompact. Adjacent tool-use/tool-result pairs that are no longer needed for reasoning are collapsed. Critically, microcompact can operate as a cache edit — modifying an existing cached prompt entry rather than replacing messages — which preserves prompt caching benefits for the unchanged prefix.

Step 4 — Context collapse (feature-flagged). A persistent commit log of collapsed sections. Unlike autocompact, collapses are selective and reversible projections: the full history stays in the REPL array, only the model's view is trimmed. This is the last line of defense before full summarization.

Step 5 — Autocompact. When the token count crosses a threshold, fires a full summarization via a separate model call and resets compaction tracking counters. Includes a circuit breaker: autoCompactTracking.consecutiveFailures is tracked and compaction is skipped if it keeps failing.

📝Why the pipeline order is load-bearing

Each step can short-circuit the next. If context collapse gets the window under the autocompact threshold, the full summarization never fires. That tradeoff is correct: granular context is more useful than a summary, so you summarize only when you have no other option.

The sequence budget → snip → microcompact → collapse → autocompact is not arbitrary. Reversing collapse and microcompact, for instance, would mean you might microcompact content that context collapse would have removed more cleanly — wasting cache edit budget on messages that were going to disappear anyway.

The streaming tool executor

Tools are not executed after the full response arrives. The StreamingToolExecutor runs tools as their inputs complete during streaming.

This matters for multi-tool turns. If the model emits three tool calls, the first tool can start executing while the second and third are still being streamed. Model output latency and tool execution latency overlap rather than stack.

When the fallback model fires after a streaming error from the primary model, orphaned partial messages are yielded as tombstone events so the UI and transcript can remove them cleanly. The executor is discarded and a fresh one is created for the fallback attempt.

Seven recovery conditions

The loop handles seven conditions that trigger another iteration rather than exiting:

Recovery conditions in queryLoop

Seven conditions that trigger continue rather than Terminal. The JS version matches the actual source structure.

javascriptquery.ts (simplified)

Matches the actual source structure

// 1. Fallback model: primary model streaming failed — retry with fallback
if (streamingError && fallbackModel && !hasUsedFallback) {
// tombstone partial messages, discard executor, retry
state = { ...state, transition: { type: 'fallback_model' } }
continue
}

// 2. Autocompact circuit breaker: compact fired but failed — skip and continue
if (compactFailed && consecutiveFailures < CIRCUIT_BREAKER_LIMIT) {
state = { ...state, transition: { type: 'compact_skip' } }
continue
}

// 3. max_output_tokens: model truncated output — retry with lower limit
if (isWithheldMaxOutputTokens(lastAssistantMsg)) {
if (maxOutputTokensRecoveryCount < MAX_RECOVERY_ATTEMPTS) {
  state = {
    ...state,
    maxOutputTokensRecoveryCount: maxOutputTokensRecoveryCount + 1,
    maxOutputTokensOverride: reducedLimit,
    transition: { type: 'max_output_tokens_recovery' },
  }
  continue
}
}

// 4. prompt_too_long: context window exceeded — try reactive compact
if (isPromptTooLongMessage(lastAssistantMsg)) {
if (!hasAttemptedReactiveCompact && reactiveCompact) {
  const compacted = await reactiveCompact.compact(messagesForQuery, ...)
  state = {
    ...state,
    messages: compacted,
    hasAttemptedReactiveCompact: true,
    transition: { type: 'reactive_compact' },
  }
  continue
}
}

// 5. stop_hook: a hook paused continuation — check and resume
if (stopHookResult.shouldContinue) {
state = { ...state, stopHookActive: false, transition: { type: 'stop_hook_resume' } }
continue
}

// 6. tool_use: model emitted tool calls — run them and loop back
if (toolResults.length > 0) {
state = { ...state, messages: updatedMessages, transition: { type: 'tool_use' } }
continue
}

// 7. max_turns not yet reached after autocompact — loop again
if (turnCount < maxTurns && didAutoCompact) {
state = { ...state, transition: { type: 'post_compact_continue' } }
continue
}

// No condition fired — exit
return Terminal({ reason: 'end_turn' })

The distinction between four and seven matters: the four-path framing covers only in-loop reactive recovery. The full seven includes fallback model retry and autocompact circuit breaker behavior that fires before the per-turn error checks.

The generator interface

query() is an AsyncGenerator. The caller drives it with for await (const event of query(params)):

The caller receives each StreamEvent, Message, and ToolUseSummaryMessage as it is yielded — no batching.
The loop continues internally without the caller having to know about it.
When the loop exits, the generator returns a Terminal. The caller reads it with const terminal = await result.value after the iteration completes.

The Terminal carries the reason the loop stopped: end_turn, max_turns, user interruption, or a non-recoverable error. The caller makes no decisions about loop continuation — that is entirely the loop's responsibility.

The `transition` field and testability

state.transition is set to the reason the previous iteration continued — tool_use, reactive_compact, max_output_tokens_recovery, and so on. It is undefined on the first iteration.

This field exists specifically for tests. Rather than parsing message arrays to verify that a recovery path fired, tests can assert on state.transition. That is the right separation: behavioral assertions without coupling to message content shape.

The same pattern applies to agent design generally. When a state machine has multiple continuation paths, naming them explicitly — as transition does — makes the machine testable, loggable, and debuggable in production.

What this means for agent design

The patterns in query.ts are not Claude Code-specific:

Separate turn params from turn state. QueryParams is what the caller provides. State is what the loop owns. Never conflate them — the caller should not need to understand loop internals.
Make the state machine explicit. The Continue transition enum makes recovery paths nameable, testable, and loggable. An implicit if / retry is harder to reason about and impossible to observe.
Keep the dependency surface minimal. Four method signatures in QueryDeps is achievable and fakes completely. More methods means more coupling, harder tests, and harder portability.
Layer context management with a fixed order. Five steps that compose in a specific sequence is better than one monolithic "truncate if needed" operation. The ordering is as important as the steps themselves.
The generator interface is the right boundary. Callers get events as they happen, not a batch result at the end. This is load-bearing for streaming UI — you cannot retrofit streaming onto a function that returns a promise.

In Claude Code's query loop, what is the primary purpose of the `transition` field on the State object?

medium

The transition field is set at continue sites inside the while(true) loop.

AIt controls which model is called on the next iteration
Incorrect.The model selection is controlled by fallbackModel in QueryParams, not by transition.
BIt records why the previous iteration continued, used by tests to assert recovery paths fired without inspecting message content
Correct!Correct. transition is set to the continue reason (tool_use, reactive_compact, etc.) and is undefined on the first iteration. This lets tests verify recovery behavior without coupling to message array shape.
CIt triggers the next compaction strategy
Incorrect.Compaction strategy is determined by autoCompactTracking and feature flags, not by transition.
DIt is used by the StreamingToolExecutor to order tool results
Incorrect.The StreamingToolExecutor is independent of transition — it tracks tool calls by their IDs.

File structure

Recoverable streaming turn loop

What a turn actually is

One Claude Code turn — the real sequence

Snapshot immutable config once, destructure mutable state at top of each iteration

Apply the five-step message preprocessing pipeline

Build the full system prompt and stream the assistant response

Execute tools in parallel and capture results

Run stop hooks and check seven recovery conditions

Return Terminal or continue