1. Agent Loop — The Core Cycle

Chapter Goals

Implement the heart of a coding agent: a while loop that continuously calls the LLM checks whether tools need to be executed executes tools feeds results back to the LLM repeats, until the LLM considers the task complete.

graph TB
    subgraph Agent Loop
        A[User Message] --> B[Call LLM API]
        B --> C{Response contains<br/>tool_use?}
        C -->|Yes| D[Execute Tool]
        D --> E[Push Tool Result to Messages]
        E --> B
        C -->|No| F[Output Text<br/>Exit Loop]
    end

    style B fill:#7c5cfc,color:#fff
    style D fill:#e8e0ff

How Claude Code Does It

Two-Layer Architecture

Claude Code splits the Agent Loop into two layers:

  • QueryEngine (~1155 lines): Session-level, manages the entire conversation lifecycle — user input processing, USD budget checks, token accounting, session recovery
  • queryLoop (~1728 lines): Single-turn level, manages one query’s execution — message compaction, API calls, tool execution, error recovery

The benefit of this split is separation of concerns: QueryEngine doesn’t need to know “how to recover from a PTL error,” and queryLoop doesn’t need to know “how to parse user input.”

queryLoop: Async Generator

queryLoop’s signature is async function* — an async generator. The reasons for choosing this over callbacks/events:

  1. Backpressure control: The producer won’t continue until the consumer finishes processing, naturally preventing event buildup
  2. Linear control flow: All loop branches are expressed with plain continue / break, no state machine needed

Seven Continue Reasons

The loop has 7 continuation points, corresponding to 7 different scenarios:

#NameTrigger ScenarioHandling Strategy
1next_turnModel called a toolExecute tool, push result to messages, continue
2collapse_drain_retryPTL error, pending collapse operationsCommit collapse to free space, retry
3reactive_compact_retryPTL error, collapse space insufficientForce full summary compaction, retry
4max_output_tokens_escalateOutput token truncation, first timeEscalate to higher token limit (16K64K), retry
5max_output_tokens_recoveryOutput token truncation, escalation unavailableInject continuation prompt, retry up to 3 times
6stop_hook_blockingTask complete but Stop Hook blockedContinue execution loop
7token_budget_continuationAPI-side token budget exhaustedContinue generation

Our simplified implementation only handles case 1: continue if there’s a tool_use, otherwise stop.

Error Withholding Strategy

This design deserves special attention: recoverable errors are not immediately exposed to the upper layer.

When output tokens are truncated, directly yielding the error to QueryEngine would show an error in the UI — but queryLoop’s subsequent recovery logic can actually handle this automatically. So Claude Code’s approach is to “withhold” the error first, execute recovery logic, and if successful, the user never notices. Only if recovery fails is the error finally exposed. Most max_output_tokens and prompt_too_long errors are silently handled this way.

Parallel Tool Execution

Claude Code uses StreamingToolExecutor to execute tools in parallel during the API streaming response:

Serial (our implementation):
  [========= API streaming response =========][tool1][tool2][tool3]

Parallel (Claude Code):
  [========= API streaming response =========]
       ^ tool1's JSON complete -> execute immediately
            ^ tool2's JSON complete -> execute immediately

A typical API response has a 5-30 second streaming window, during which multiple tools can complete concurrently.

Our Implementation

We merge the two-layer architecture into a single Agent class, with chatAnthropic() as the core method:

TypeScript

// agent.ts -- chatAnthropic method (core Agent Loop)
 
private async chatAnthropic(userMessage: string): Promise<void> {
  this.anthropicMessages.push({ role: "user", content: userMessage });
  // Trigger auto-compact at the turn boundary: the last message is now
  // plain user text, so compactAnthropic's slice(0, -1) won't sever a
  // tool_use <-> tool_result pair (see Chapter 7).
  await this.checkAndCompact();
 
  while (true) {
    if (this.abortController?.signal.aborted) break;
 
    const response = await this.callAnthropicStream();
 
    // Accumulate token usage
    this.totalInputTokens += response.usage.input_tokens;
    this.totalOutputTokens += response.usage.output_tokens;
    this.lastInputTokenCount = response.usage.input_tokens;
 
    // Extract tool_use blocks
    const toolUses: Anthropic.ToolUseBlock[] = [];
    for (const block of response.content) {
      if (block.type === "tool_use") toolUses.push(block);
    }
 
    // Push assistant response into history
    this.anthropicMessages.push({ role: "assistant", content: response.content });
 
    // No tool calls -> task complete
    if (toolUses.length === 0) {
      printCost(this.totalInputTokens, this.totalOutputTokens);
      break;
    }
 
    // Execute each tool serially
    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const toolUse of toolUses) {
      if (this.abortController?.signal.aborted) break;
 
      const input = toolUse.input as Record<string, any>;
      printToolCall(toolUse.name, input);
 
      // Permission check (see Chapter 6)
      const perm = checkPermission(toolUse.name, input, this.permissionMode, this.planFilePath);
      if (perm.action === "deny") {
        toolResults.push({ type: "tool_result", tool_use_id: toolUse.id,
          content: `Action denied: ${perm.message}` });
        continue;
      }
      if (perm.action === "confirm" && perm.message && !this.confirmedPaths.has(perm.message)) {
        const confirmed = await this.confirmDangerous(perm.message);
        if (!confirmed) {
          toolResults.push({ type: "tool_result", tool_use_id: toolUse.id,
            content: "User denied this action." });
          continue;
        }
        this.confirmedPaths.add(perm.message);
      }
 
      const result = await executeTool(toolUse.name, input);
      printToolResult(toolUse.name, result);
      toolResults.push({ type: "tool_result", tool_use_id: toolUse.id, content: result });
    }
 
    // Push tool results as user message (Anthropic API requirement)
    this.anthropicMessages.push({ role: "user", content: toolResults });
  }
}

Python

# agent.py -- _chat_anthropic method (core Agent Loop)
 
async def _chat_anthropic(self, user_message: str) -> None:
    self._anthropic_messages.append({"role": "user", "content": user_message})
    # Trigger auto-compact at the turn boundary: the last message is now
    # plain user text, so _compact_anthropic's [:-1] won't sever a
    # tool_use <-> tool_result pair (see Chapter 7).
    await self._check_and_compact()
 
    while True:
        if self._aborted:
            break
 
        self._run_compression_pipeline()
        response = await self._call_anthropic_stream()
 
        self.total_input_tokens += response.usage.input_tokens
        self.total_output_tokens += response.usage.output_tokens
        self.last_input_token_count = response.usage.input_tokens
 
        tool_uses = [b for b in response.content if b.type == "tool_use"]
 
        self._anthropic_messages.append({
            "role": "assistant",
            "content": [self._block_to_dict(b) for b in response.content],
        })
 
        if not tool_uses:
            if not self.is_sub_agent:
                print_cost(self.total_input_tokens, self.total_output_tokens)
            break
 
        tool_results = []
        for tu in tool_uses:
            if self._aborted:
                break
            inp = dict(tu.input) if hasattr(tu.input, 'items') else tu.input
            print_tool_call(tu.name, inp)
 
            # Permission check (see Chapter 6)
            perm = check_permission(tu.name, inp, self.permission_mode, self._plan_file_path)
            if perm["action"] == "deny":
                tool_results.append({"type": "tool_result", "tool_use_id": tu.id,
                                     "content": f"Action denied: {perm.get('message', '')}"})
                continue
            if perm["action"] == "confirm" and perm.get("message") \
               and perm["message"] not in self._confirmed_paths:
                confirmed = await self._confirm_dangerous(perm["message"])
                if not confirmed:
                    tool_results.append({"type": "tool_result", "tool_use_id": tu.id,
                                         "content": "User denied this action."})
                    continue
                self._confirmed_paths.add(perm["message"])
 
            result = await self._execute_tool_call(tu.name, inp)
            print_tool_result(tu.name, result)
            tool_results.append({"type": "tool_result", "tool_use_id": tu.id, "content": result})
 
        self._anthropic_messages.append({"role": "user", "content": tool_results})

How the Message Array Grows

The key to understanding the Agent Loop is how the message array grows.

Turn 1:
  messages = [
    { role: "user",      content: "Help me fix the bug" }
    { role: "assistant", content: [text + tool_use(read_file)] }
    { role: "user",      content: [tool_result("file contents...")] }
  ]

Turn 2 (LLM sees file contents and decides to edit):
  messages = [
    ...first 3 messages,
    { role: "assistant", content: [text + tool_use(edit_file)] }
    { role: "user",      content: [tool_result("edit successful")] }
  ]

Turn 3 (LLM considers task complete):
  messages = [
    ...first 5 messages,
    { role: "assistant", content: [text("Fixed!")] }  <- no tool_use -> break
  ]

Each loop iteration adds two messages to the array: one assistant, one user (tool result). The model sees the complete history every time, which is how it “remembers” what it has done before. Tool results are pushed with role: "user" as required by the Anthropic API protocol, and results must be linked back to the corresponding call via tool_use_id.

AbortController: Graceful Interruption

TypeScript

async chat(userMessage: string): Promise<void> {
  this.abortController = new AbortController();
  try {
    await this.chatAnthropic(userMessage);
  } finally {
    this.abortController = null;
  }
  printDivider();
  this.autoSave();
}
 
abort() {
  this.abortController?.abort();
}

Python

async def chat(self, user_message: str) -> None:
    self._aborted = False
    try:
        if self.use_openai:
            await self._chat_openai(user_message)
        else:
            await self._chat_anthropic(user_message)
    finally:
        pass
    if not self.is_sub_agent:
        print_divider()
        self._auto_save()
 
def abort(self) -> None:
    self._aborted = True

AbortController is the standard interruption mechanism: once abort() is called, the signal becomes aborted, and the loop exits at the next checkpoint. The signal is also passed to the API call, ensuring network requests can be cancelled too.


Next chapter: The driving force of the loop is tools — without tools, the LLM is just a chatbot. Let’s look at the tool system implementation.