6.3 Telemetry & Observability: OpenTelemetry in Practice

Goal: understand how Claude Code implements full-stack observability via OpenTelemetry, and why metric design matters.


Why telemetry matters

As an AI product, Claude Code constantly needs answers to questions like:

Telemetry infrastructure exists to answer these.


Full OpenTelemetry integration

Claude Code uses the complete OTel stack:

// initialized in src/bootstrap/state.ts

All three observability pillars are covered:


Core metric design

bootstrap/state.ts defines counters and session aggregates:

type State = {
  meter: Meter | null

  // business counters
  sessionCounter: AttributedCounter | null
  locCounter: AttributedCounter | null
  prCounter: AttributedCounter | null
  commitCounter: AttributedCounter | null
  costCounter: AttributedCounter | null
  tokenCounter: AttributedCounter | null
  codeEditToolDecisionCounter: AttributedCounter | null
  activeTimeCounter: AttributedCounter | null

  // aggregate performance
  totalCostUSD: number
  totalAPIDuration: number
  totalAPIDurationWithoutRetries: number
  totalToolDuration: number

  // turn-level metrics
  turnToolDurationMs: number
  turnHookDurationMs: number
  turnClassifierDurationMs: number
  turnToolCount: number
  turnHookCount: number

  // code churn
  totalLinesAdded: number
  totalLinesRemoved: number
}

Why turn-level metrics matter:


AttributedCounter: dimensional counters

  add(value: number, additionalAttributes?: Attributes): void
}

"Attributed" means each increment can carry dimensions:

// token usage with model dimension
tokenCounter?.add(usage.inputTokens + usage.outputTokens, {
  model: modelName,
  type: 'input_output',
  querySource: 'repl',
})

// code-edit tool decision breakdown
codeEditToolDecisionCounter?.add(1, {
  decision: 'accepted', // accepted | rejected | modified
  tool: 'str_replace_based_edit',
})

This enables grouped analysis by model, tool, decision, and source.


statsStore: profiling-oriented observations

type State = {
  statsStore: { observe(name: string, value: number): void } | null
}

statsStore captures lightweight timing points less suited for stable OTel counters:

statsStore?.observe('growthbook_cold_start_ms', duration)
statsStore?.observe('system_prompt_build_ms', duration)

Telemetry data-protection conventions

Claude Code enforces strict guardrails on telemetry payloads.

Naming contract: _I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS

type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = {
  [key: string]: string | number | boolean | null
}

  name: string,
  metadata: AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS,
): void

This deliberately verbose type name acts as a coding-time safety reminder: do not send source code or file paths.

Naming convention: _NO_PII_IN_THESE_PARAMS

  level: 'info' | 'warn' | 'error',
  event: string,
  params?: Record<string, unknown>,
): void

NoPII indicates parameters are logged and must not contain personal data.


DataDog integration

Telemetry is eventually exported to DataDog:

const shouldLogDatadog = getFeatureValue_CACHED_MAY_BE_STALE('tengu_log_datadog_events')

headlessProfilerCheckpoint: startup profiling

// src/utils/headlessProfiler.ts

Used in critical paths like QueryEngine.submitMessage():

headlessProfilerCheckpoint('before_getSystemPrompt')
// ... fetchSystemPromptParts() ...
headlessProfilerCheckpoint('after_getSystemPrompt')

In headless mode, checkpoint deltas are recorded to analyze latency distributions.


Next