3.2 query.ts: Source-Level Deep Dive of the Main Loop

Chapter goal: Understand the core logic of the queryLoop function line by line, and master how the agent main loop is implemented.

Function signature: async generator

// src/query.ts
  params: QueryParams,
): AsyncGenerator<StreamEvent | RequestStartEvent | Message | ...> {
  const consumedCommandUuids: string[] = []
  const terminal = yield* queryLoop(params, consumedCommandUuids)
  // ...
  return terminal
}

query() is an outer wrapper around queryLoop(), mainly responsible for command-lifecycle wrap-up notifications. The core logic is inside queryLoop().

QueryParams: the "ticket" into the loop

  messages: Message[]              // history messages (including current user input)
  systemPrompt: SystemPrompt       // system prompt
  userContext: { [k: string]: string }  // user context injected into System Prompt
  systemContext: { [k: string]: string } // system context injected into System Prompt
  canUseTool: CanUseToolFn         // permission check function
  toolUseContext: ToolUseContext   // all context needed for tool execution
  fallbackModel?: string           // fallback model (e.g., Sonnet downgraded to Haiku)
  querySource: QuerySource         // request source (REPL / SDK / headless)
  maxOutputTokensOverride?: number // override max output tokens
  maxTurns?: number                // maximum loop turns
  taskBudget?: { total: number }   // API-layer task budget (beta)
}

Loop state: State object

type State = {
  messages: Message[]                   // ever-growing message array
  toolUseContext: ToolUseContext         // tool context (may be updated during loop)
  autoCompactTracking: ...              // auto-compaction state tracking
  maxOutputTokensRecoveryCount: number  // max_output_tokens recovery counter (up to 3)
  hasAttemptedReactiveCompact: boolean  // whether reactive compaction was attempted
  maxOutputTokensOverride: number | undefined
  pendingToolUseSummary: Promise<...>   // tool-use summary (generated in background)
  stopHookActive: boolean | undefined   // whether Stop Hook is active
  turnCount: number                     // current turn count
  transition: Continue | undefined      // reason for continuing from last iteration
}

Key design: State uses immutable-update style:

// Not this:
state.messages = newMessages

// But this:
state = { ...state, messages: newMessages, transition: continueReason }

This makes state changes traceable and easier to debug.

Main loop skeleton

async function* queryLoop(params, consumedCommandUuids) {
  let state = { messages: params.messages, ... }

  while (true) {
    const { messages, toolUseContext } = state

    // ① Build API request
    const model = getRuntimeMainLoopModel(...)
    const apiMessages = normalizeMessagesForAPI(messages)
    
    // ② Yield request-start event (so UI knows request started)
    yield { type: 'request_start', model }

    // ③ Call Claude API in streaming mode
    for await (const event of callClaudeAPI({ model, messages: apiMessages, ... })) {
      yield event  // yield to UI in real time
    }

    // ④ Get final AssistantMessage
    const assistantMessage = getLastAssistantMessage(messages)

    // ⑤ Check exit condition
    if (assistantMessage.stop_reason === 'end_turn') {
      return { type: 'end_turn', ... }  // exit loop
    }

    // ⑥ Handle tool calls
    if (assistantMessage.stop_reason === 'tool_use') {
      const toolResults = yield* runTools(assistantMessage, toolUseContext)
      state = { ...state, messages: [...messages, assistantMessage, ...toolResults] }
      continue  // next iteration
    }

    // ⑦ Handle exceptional cases (max_tokens, prompt_too_long, etc.)
    // ...
  }
}

Context compaction: prevent conversation "explosion"

Claude has a context-window limit (~200K tokens). When history gets too long, queryLoop has two strategies:

AutoCompact

Triggered when token usage approaches the limit:

// src/services/compact/autoCompact.ts
  contextTokens: number,
  maxTokens: number,
): 'safe' | 'warning' | 'critical' {
  const ratio = contextTokens / maxTokens
  if (ratio < 0.7) return 'safe'
  if (ratio < 0.85) return 'warning'
  return 'critical'  // trigger AutoCompact
}

How AutoCompact works:

Send history messages to a small model (Haiku)
Ask it to generate a conversation summary
Replace old messages with the summary
Continue the current Turn

ReactiveCompact (feature gate)

// Controlled by compile switch
const reactiveCompact = feature('REACTIVE_COMPACT')
  ? require('./services/compact/reactiveCompact.js')
  : null

ReactiveCompact triggers immediately upon prompt_too_long errors and retries, more aggressively than AutoCompact.

max_output_tokens recovery logic

When Claude output is truncated (stop_reason = 'max_tokens'), the loop attempts recovery:

const MAX_OUTPUT_TOKENS_RECOVERY_LIMIT = 3

// Recovery strategy
if (assistantMessage.stop_reason === 'max_tokens' && 
    state.maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {
  
  // Append a "continue" user message
  const continueMessage = createUserMessage({
    content: [{ type: 'text', text: 'Continue' }]
  })
  
  state = {
    ...state,
    messages: [...messages, assistantMessage, continueMessage],
    maxOutputTokensRecoveryCount: state.maxOutputTokensRecoveryCount + 1,
    transition: { type: 'max_tokens_recovery' }
  }
  continue
}

This is why Claude can sometimes continue output automatically after being cut off.

Tool execution: runTools

Tool execution is the most complex part of the main loop and is delegated to runTools():

// src/services/tools/toolOrchestration.ts

// Inside queryLoop
const toolMessages = yield* runTools(assistantMessage, toolUseContext, canUseTool)

runTools handles:

Concurrent execution of multiple tools (one message may have multiple tool_use blocks)
Permission checks (canUseTool)
Tool result collection
Error handling (tool failures become results returned to Claude, not loop crashes)

Token budget (feature gate)

const budgetTracker = feature('TOKEN_BUDGET') ? createBudgetTracker() : null

The token budget system (TOKEN_BUDGET switch) tracks token consumption in the current Turn and forcibly terminates the loop when budget is exceeded, then informs Claude.

Exit state: Terminal

queryLoop is not an infinite loop; it returns a Terminal type to indicate why it exited:

type Terminal =
  | { type: 'end_turn' }           // normal finish
  | { type: 'max_turns' }          // reached max turns
  | { type: 'abort' }              // user interrupt (Ctrl+C)
  | { type: 'budget_exceeded' }    // token budget exceeded
  | { type: 'error'; error: Error } // unrecoverable error

3.3 QueryEngine: Conversation state manager — detailed QueryEngine class
3.4 System Prompt construction mechanism — parsing fetchSystemPromptParts