Claude Code Observability: Dual-Sink Analytics, OTEL Spans, Perfetto Traces

3 min readAI Agents

How dual-sink event routing, compile-time PII enforcement, full-turn OTEL instrumentation, and Perfetto/Chrome Trace export give Claude Code production-grade observability from a local-first runtime.

ai-agentsclaude-codeobservabilityanalyticsopentelemetrytracing

Observability in a local-first agent runtime is harder than in a server: there is no always-on backend to receive events, sessions may be offline, and PII is on the user's machine.

Claude Code has a layered solution: dual-sink analytics, OpenTelemetry spans, BigQuery output, and Perfetto traces for deep performance analysis.

File structure

Files covered in this post6 files
src/
├── services/
│   ├── analytics/
│   │   ├── index.ts
│   │   ├── types.ts
│   │   └── bucketHash.ts
└── utils/
    └── telemetry/
        ├── otel.ts
        ├── betaSessionTracing.ts
        └── perfetto.ts

src/services/analytics/

Dual-sink event emission

High

Routes analytics events to two sinks simultaneously: Datadog (product metrics) and a first-party Anthropic sink. PII fields are marked with a _PROTO_ prefix convention. User bucket hashing ensures consistent A/B assignment. The type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS is a never-type marker enforcing that metadata values are not code or file paths.

Module
Analytics + Observability

Key Exports

  • trackEvent
  • AnalyticsEvent
  • AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS

Why It Matters

  • The never-type marker on metadata is a compile-time enforcement of PII policy — you cannot accidentally pass file paths as event metadata.
  • Dual-sink means product metrics (Datadog) and usage analytics (1P sink) are always in sync — no divergence between what the team sees and what billing sees.
  • User bucket hashing for A/B assignment means feature flag behavior is consistent across sessions for the same user.

src/utils/telemetry/

OpenTelemetry spans and BigQuery output

High

Instruments the query loop, tool execution, and MCP calls with OTEL spans. Exports to BigQuery for aggregate analysis. Includes a beta session tracer that extracts <system-reminder> tags as separate span attributes for structured analysis of system prompt injection.

Module
Analytics + Observability

Key Exports

  • startSpan
  • recordToolExecution
  • BetaSessionTracer
  • exportToBigQuery

Why It Matters

  • OTEL spans wrap the full turn execution, not just the model call — tool execution, permission checks, and MCP calls all generate spans.
  • The beta session tracer treating <system-reminder> tags as separate attributes means system prompt injection is observable in BigQuery without manually parsing raw transcripts.
  • BigQuery export enables aggregate analysis across millions of sessions while keeping raw transcripts local.

src/utils/telemetry/betaSessionTracing.ts

Beta feature tracing with system-reminder extraction

Medium

492-line beta tracer that intercepts conversation messages, extracts <system-reminder> tags as named span attributes, and records feature flag states alongside turn spans. Enables analysis of how beta features and system prompt injections correlate with turn outcomes.

Module
Analytics + Observability

Key Exports

  • BetaSessionTracer
  • extractSystemReminders
  • recordBetaSpan

Why It Matters

  • Extracting <system-reminder> tags as attributes means prompt injection points are structured data, not buried in raw text.
  • Correlating feature flags with turn outcomes in the same span tree enables direct attribution of behavior changes to specific beta features.
  • At 492 lines, this is a production-grade tracer, not a debug util — it reflects how seriously Anthropic instruments beta behavior.

The dual-sink design

Analytics and observability pipeline

From agent event to Datadog metric and BigQuery aggregate.

  1. 1

    Emit typed event

    services/analytics/

    trackEvent() accepts a typed AnalyticsEvent. Metadata values are enforced by the never-type marker to not contain code or file paths — a compile-time PII guard.

  2. 2

    Route to dual sinks

    services/analytics/ (sink routing)

    Each event is sent to both Datadog (product metrics dashboard) and the Anthropic 1P sink (usage analytics, billing). Both sinks receive every event — no routing logic that could cause divergence.

  3. 3

    Hash user bucket for A/B

    services/analytics/ (user bucket hashing)

    User identity is hashed to a stable bucket for consistent A/B assignment. Same user in same bucket across sessions — no jitter in feature flag behavior.

  4. 4

    Instrument turn with OTEL spans

    utils/telemetry/

    The query loop, tool execution, permission checks, and MCP calls each open OTEL spans. The full turn becomes a traceable span tree.

  5. 5

    Extract system-reminder tags

    utils/telemetry/betaSessionTracing.ts

    The beta tracer intercepts messages and pulls <system-reminder> tags into named span attributes. System prompt injection is now structured observability data.

  6. 6

    Export to BigQuery

    utils/telemetry/ (BigQuery export)

    Span trees are exported to BigQuery for aggregate analysis. Raw transcripts stay local; only structured span data leaves the machine.

The PII enforcement design

The type name AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS is one of the most deliberate design choices in the codebase.

It is a never-type marker — a TypeScript type that cannot be assigned any value except never, meaning the only way to use it is to explicitly cast and declare the verification.

The consequence: passing a file path or code snippet as analytics event metadata is a compile-time error, not a runtime check.

This matters because analytics metadata ends up in Datadog dashboards and BigQuery tables that may be accessible to many people. If code or file paths accidentally end up there, they could expose source code, credentials in file names, or personal paths.

The verification type forces every call site to explicitly attest that its metadata is safe.

Why OTEL spans wrap more than the model call

A common pattern in agent instrumentation is to put a span only around the model API call — measure latency, record token counts, done.

Claude Code wraps a much wider surface: the full turn, including tool execution, permission checks, MCP client calls, and task spawning.

This matters because in a real agent session, the model call is often not the slowest part. A tool that runs a shell command, a permission prompt waiting for user input, an MCP server that times out — these all appear in the span tree, giving a complete picture of where a slow turn actually spent its time.

Perfetto and Chrome Trace format

The telemetry layer also supports Perfetto trace export in Chrome Trace format.

Perfetto is Google's system profiler. The Chrome Trace format is readable in chrome://tracing, Perfetto UI, and Speedscope.

This is not for production monitoring — it is for deep performance analysis of individual sessions. A developer investigating a slow turn can export a Perfetto trace and see exactly how the span tree corresponds to CPU and wall-clock time.

Analytics dual-sink and OTEL span pattern

The PII-enforced event emission and span-wrapped turn execution. JS shows the actual shape; Python and Go show the same structure.

typescriptservices/analytics/ + utils/telemetry/ (simplified)

Never-type PII marker and dual-sink routing — actual pattern from learning-claude-code

// PII enforcement: this type can only be assigned by explicit verification
// The verbose name is the enforcement — it is not an accident
declare const _VERIFIED: unique symbol
type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = {
readonly [_VERIFIED]: true
[key: string]: string | number | boolean
}

function verifyMetadata(
m: Record<string, string | number | boolean>
): AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS {
// Call site must explicitly invoke this — the type name is the documentation
return m as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS
}

interface AnalyticsEvent {
name: string
metadata: AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS
userId?: string
}

// Dual sink: both sinks always receive the same event
async function trackEvent(event: AnalyticsEvent): Promise<void> {
await Promise.allSettled([
  datadogSink.emit(event),
  firstPartySink.emit(event)
])
}

// User bucket hashing for stable A/B assignment
function getUserBucket(userId: string, bucketCount = 100): number {
const hash = murmurhash(userId)
return hash % bucketCount
}

// OTEL span wrapping the full turn — not just the model call
async function runTurnWithSpan(
input: string,
context: SessionContext
): Promise<TurnResult> {
return tracer.startActiveSpan('turn.execute', async (span) => {
  span.setAttributes({
    'turn.input_length': input.length,
    'session.id': context.sessionId,
  })

  try {
    const result = await runTurn(input, context)
    span.setStatus({ code: SpanStatusCode.OK })
    span.setAttributes({ 'turn.tool_calls': result.toolCallCount })
    return result
  } catch (err) {
    span.setStatus({ code: SpanStatusCode.ERROR })
    span.recordException(err as Error)
    throw
  } finally {
    span.end()
  }
})
}

Claude Code uses a type named AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS as a compile-time PII guard. What would be the consequence of enforcing this policy with a runtime check (e.g., a regex that rejects strings containing slashes) instead?

medium

Analytics events end up in Datadog dashboards and BigQuery tables accessible to many people. File paths and code snippets in event metadata could expose sensitive information.

  • ARuntime checks are faster — the never-type approach adds TypeScript compilation overhead
    Incorrect.TypeScript type checking happens at compile time, not at runtime. It adds no overhead to the running program.
  • BA runtime regex would catch different patterns in different languages and environments, making it unreliable for a cross-platform tool
    Incorrect.This is a secondary concern. The more fundamental issue is that runtime checks fail silently (the event is dropped or the call throws) only after the code reaches production, while the compile-time approach prevents the mistake from being written at all.
  • CRuntime checks fail at the call site after the code is written and deployed — they catch mistakes in production rather than at the point where the code is authored, and they can still be bypassed by converting values to safe-looking strings
    Correct!Correct. A runtime check means the code compiles and ships even with the wrong metadata. The check then fires when the event is actually emitted — possibly in a session months after the code was written. The never-type marker is caught by the TypeScript compiler at the moment the developer writes the problematic call, before any code ships. Additionally, a runtime string check can be trivially bypassed by encoding a path differently (URL encoding, base64). The type-based approach cannot be bypassed without an explicit cast that documents the override.
  • DRuntime checks would prevent legitimate metadata like version numbers that contain dots (which look like file extensions)
    Incorrect.This describes a false positive risk in the check's pattern, not a fundamental difference between compile-time and runtime enforcement.