Claude Code Architecture: Five Principles for Building Agent Runtimes — Blog

Claude Code is not best understood as a CLI that happens to call a model and a few tools.

It is better understood as a local-first runtime host for agentic work. That framing explains the codebase much more cleanly — and once you see it, the surrounding design choices stop looking like overengineering.

The six-layer architecture

Claude Code architecture in one stack

The full source tree resolves into six layers with clear ownership boundaries.

1
Boot the runtime host (cli.tsx → main.tsx → init.ts)
src/cli.tsx / src/main.tsx / src/entrypoints/init.ts
cli.tsx implements waterfall fast-paths and side-effect hoisting — MDM reads and keychain prefetches fire between import statements at T+2ms, before module evaluation completes at T+137ms. main.tsx normalizes launch mode, loads trust state, and composes the full session before the first turn begins.
The API preconnection HEAD request fires at T+150ms, placing a TCP+TLS connection into Bun's keep-alive pool before the first model call. This saves 100–200ms on the first real request — a startup optimization invisible to the user but measurable in session latency.
2
Assemble a governed capability surface
src/commands.ts / src/tools.ts / src/services/mcp/
Commands, tools, permissions, skills, plugins, and MCP servers are all filtered and adapted into a session-specific surface before the agent runs. What the model can see and call is a product of this assembly step.
3
Run a recoverable turn loop (query.ts)
src/query.ts / src/services/tools/
A while(true) loop with seven recovery conditions, a five-step message preprocessing pipeline, and the StreamingToolExecutor that overlaps tool execution with model output streaming. The loop only exits when Terminal is returned.
QueryDeps has exactly four method signatures — callModel, compact, uuid, now. That minimal surface is fakeble in tests and covers every external dependency the loop touches. The transition field on State names why each iteration continued, making recovery paths testable without inspecting message content.
4
Project state into a session control plane
src/state/AppStateStore.ts / src/state/Store.ts
AppStateStore (~80 fields, DeepImmutable<T>) tracks tasks, permissions, MCP clients, overlays, notifications, and other runtime objects that outlive any single render. The Store itself is 35 lines with an optional onChange callback. useSyncExternalStore selectors prevent unnecessary re-renders.
5
Render through a custom terminal UI runtime
src/ink/ink.tsx / src/screens/REPL.tsx
The Ink stack handles input, layout, selection, patching, and screen lifecycle. REPL.tsx at 5005 lines mounts 30+ hooks and is the composition root — not a UI file. The QueryGuard state machine prevents race conditions between query reservation and execution.
6
Promote detached work into child runtimes
src/tasks/ / src/tools/AgentTool/
Background shell tasks, local subagents, in-process teammates, and remote sessions all become first-class managed runtime objects with lifecycle, observation, stop semantics, and persistence.

What the full source tree looks like

Full Claude Code source coverage56 files

src/
├── cli.tsx
├── main.tsx
├── query.ts
├── Tool.ts
├── tools.ts
├── Task.ts
├── tasks.ts
├── setup.ts
├── replLauncher.tsx
├── coordinator/
│   └── coordinatorMode.ts
├── entrypoints/
│   └── init.ts
├── hooks/
│   ├── useCanUseTool.tsx
│   └── toolPermission/
│       └── handlers/
│           ├── autoHandler.ts
│           ├── interactiveHandler.ts
├── ink/
│   └── ink.tsx
├── query/
│   ├── config.ts
│   ├── tokenBudget.ts
│   └── stopHooks.ts
├── screens/
│   └── REPL.tsx
├── services/
│   ├── analytics/
│   │   ├── index.ts
│   ├── autoDream/
│   │   └── autoDream.ts
│   ├── compact/
│   │   ├── autoCompact.ts
│   │   ├── microCompact.ts
│   │   └── contextCollapse.ts
│   ├── extractMemories/
│   │   └── extractMemories.ts
│   ├── lsp/
│   │   ├── LSPServerInstance.ts
│   │   ├── LSPDiagnosticRegistry.ts
│   ├── mcp/
│   │   ├── config.ts
│   │   ├── client.ts
│   │   ├── ChromeMcpClient.ts
│   ├── relevantMemories/
│   │   └── relevantMemories.ts
│   ├── SessionMemory/
│   │   └── SessionMemory.ts
│   └── tools/
│       ├── toolOrchestration.ts
│       ├── toolExecution.ts
│       ├── StreamingToolExecutor.ts
├── state/
│   ├── AppStateStore.ts
│   └── Store.ts
├── tasks/
│   ├── DreamTask/
│   │   └── DreamTask.ts
│   └── RemoteAgentTask/
│       └── RemoteAgentTask.tsx
├── tools/
│   ├── AgentTool/
│   │   ├── AgentTool.tsx
│   │   └── runAgent.ts
│   ├── BashTool/
│   │   ├── BashTool.tsx
│   │   ├── bashPermissions.ts
│   │   ├── readOnlyValidation.ts
│   ├── FileEditTool/
│   │   └── FileEditTool.ts
│   ├── FileReadTool/
│   │   └── FileReadTool.ts
└── utils/
    ├── memdir.ts
    ├── permissions/
    │   ├── permissions.ts
    │   └── permissionSetup.ts
    ├── processUserInput/
    │   └── processUserInput.ts
    ├── speculation/
    │   ├── speculativeExecute.ts
    ├── swarm/
    │   ├── backends/
    │   │   ├── TmuxBackend.ts
    │   │   ├── ITermBackend.ts
    │   │   └── InProcessBackend.ts
    └── telemetry/
        ├── otel.ts
        ├── betaSessionTracing.ts

Five principles distilled from the source

1. Capability assembly matters as much as model execution

Claude Code spends substantial architecture budget on deciding what capabilities exist in this session, which are visible, which are trusted, and which are allowed right now.

That is why tools.ts, commands.ts, utils/permissions/, and services/mcp/ all feel central rather than peripheral. The product is assembling a governed runtime surface, not exposing raw functions.

📝The permission engine as a design document

The seven-step decision tree in hasPermissionsToUseToolInner is a useful template for any agent permission system. The key design points:

Safety checks are bypass-immune (return before the mode check, cannot be cleared by any mode)
Content-specific rules are bypass-resistant (fire before mode check, but below safety checks)
Mode-level grants (bypassPermissions, auto) only apply after all rule and tool-specific checks pass
Auto mode uses an AI Classifier, not the user — and dangerous allow rules are stripped on Auto mode entry rather than inherited

This is the right model for building permission systems that have both governance and usability.

2. The turn loop is the real execution unit

The right execution unit is not a single API request — it is a recoverable turn loop with named continuation reasons.

The seven recovery conditions in query.ts cover the full failure surface: fallback model retry, compaction circuit breaker, max output tokens recovery, prompt-too-long reactive compact, stop hook resume, tool use continuation, and post-compact continuation. Each is named as a transition value so tests can assert which path fired without inspecting message content.

The five-step message preprocessing pipeline runs before every API call in a load-bearing order: budget → snip → microcompact → context collapse → autocompact. The ordering is not arbitrary — each step can short-circuit the next.

3. Tool execution is an internal subsystem

Tools are not callbacks hanging off the side of the loop. The tool execution subsystem has scheduling (partitionToolCalls — concurrent vs serial per invocation), governance (permission pipeline with pre/post hooks), and streaming-aware behavior (StreamingToolExecutor starts tools as inputs complete, not after the full response arrives).

BashTool's five defense layers illustrate the design philosophy: tree-sitter AST parsing (fail-closed on unparseable input), read-only validation, path validation, destructive command warnings, and sandbox routing. The speculative pre-check pattern — validating bash permissions before showing the approval dialog — is the kind of UX detail that only emerges from thinking about the full flow.

4. State is a control plane, not just UI bookkeeping

AppStateStore (~80 fields, DeepImmutable<T>) tracks tasks, permissions, MCP clients, overlays, notifications, and other runtime objects that outlive any single render. The Store itself is 35 lines. The key design: useSyncExternalStore selectors prevent unnecessary re-renders, and the optional onChange callback centralizes side effects.

The ToolUseContext isolation model is worth noting: sub-agents get a no-op setAppState (so they cannot mutate parent session state), but setAppStateForTasks always pierces to the root Store. Fork agents use byte-exact prompt threading to preserve prompt cache coherence with the parent.

5. Detached work becomes first-class runtime

Background shell tasks, local subagents, in-process teammates, and remote sessions all become managed runtime objects with lifecycle, observation, stop semantics, and persistence — not hidden implementation details.

The DreamTask pattern is reusable: promote any long-running background operation into a Task, make it observable in the UI, add cancellation. This is why AutoDream appears as a named task rather than silent background work.

Non-obvious details that change the mental model

Side-effect hoisting at startup. startMdmRawRead() and startKeychainPrefetch() fire at T+2ms between import statements — before module evaluation completes at T+137ms. The API preconnection HEAD request at T+150ms puts a TCP+TLS connection into Bun's keep-alive pool before any model call is needed. These are startup latency optimizations buried in the import order.

SYSTEM_PROMPT_DYNAMIC_BOUNDARY. A sentinel string that splits the system prompt into a static section (gets a global cache entry) and a dynamic section (null cache header, never cached). Changing anything below the boundary does not invalidate the cached static prefix. This is load-bearing for prompt caching cost.

Chrome MCP runs in-process. Instead of spawning a ~325MB subprocess for Chrome integration, Claude Code runs the Chrome MCP client in-process. The implementation is behind ChromeMcpClient.ts and the initialization is gated to avoid paying the cost when Chrome MCP is not configured.

DA1 terminal capability detection. Rather than writing a test string and reading the response (which requires a timeout), Claude Code uses the DA1 (Device Attributes) ANSI escape sequence to probe terminal capabilities synchronously. This is how it detects 256-color and true-color support without a TTY timeout.

The memory system's fifth layer. Most descriptions stop at four memory layers. The fifth — Relevant Memories — fires a lightweight Sonnet side-query before each turn to select at most 5 contextually appropriate memory files from memdir. This is what makes large memdirs scale: you pay only for what is relevant to the current turn.

The main design lesson

Serious agent products are built around runtime architecture, not just reasoning quality.

The model matters, but the harder product questions are:

what gets assembled into the session
what actions are governed
what state persists
how failure recovers
how detached work is controlled

Claude Code is worth studying because the source keeps answering those questions in concrete runtime terms rather than deferring them to "the model will figure it out."

📝Explore the architecture interactively

The Architecture Lab maps all 14 Claude Code subsystems as an interactive diagram — Boot, Input Router, Query Loop, Permission Engine, Memory System, Task + Swarm, Bridge, and more — with source anchors, tradeoffs, and design rationale for each node.

The API Flow Map visualizes the control-plane flows as sequence diagrams: boot session assembly, normal agent turn, tool execution loop, permission resolution, MCP capability refresh, subagent orchestration, memory context injection, and bridge remote control.

Both are derived from the same source reading this series covers.

A team is building a new AI coding assistant. They plan to focus on prompt quality and model selection first, with 'runtime plumbing' as a later phase. Based on the Claude Code architecture, what is the most important thing they are deferring?

medium

Claude Code's architecture is organized around six layers: boot/session assembly, capability governance, recoverable turn loop, session control plane, terminal UI, and child runtimes.

AThe terminal UI renderer — a good TUI is what separates professional tools from scripts
Incorrect.The UI matters, but it is a projection of state. A team could build a solid agent product with a simpler UI if the underlying runtime is well-designed.
BCapability governance and failure recovery — what gets assembled into the session, how actions are authorized, and how the turn loop recovers are load-bearing before model quality shows up
Correct!Correct. The Claude Code source shows that the hardest product problems are: what capabilities exist, which are trusted, how failure recovers, and how detached work is controlled. These shape the core execution path from the start and cannot easily be bolted on after. The recoverable turn loop, permission engine, and task runtime are not 'plumbing' — they are the product.
CModel selection — choosing the wrong model will make all runtime work pointless
Incorrect.Model selection matters, but the runtime layer is the reason the same model produces a reliable, governable product. The runtime is not downstream of the model choice.
DSubagent support — multi-agent patterns are the main differentiator for serious tools
Incorrect.Subagents are valuable, but they require the single-agent runtime to be solid first. Claude Code's task and subagent runtime is built on top of a well-designed base loop, not instead of it.

The six-layer architecture

Claude Code architecture in one stack

Boot the runtime host (cli.tsx → main.tsx → init.ts)

Assemble a governed capability surface

Run a recoverable turn loop (query.ts)

Project state into a session control plane

Render through a custom terminal UI runtime

Promote detached work into child runtimes