11. Multi-Agent Architecture

Chapter Goals

Implement a Sub-Agent system: allow the main Agent to spawn independent sub-agents that perform exploration, planning, and general tasks, returning results to the main Agent when complete. This is Claude Code’s most important “divide and conquer” mechanism for handling complex tasks.

graph TB
    User[User request] --> Main[Main Agent]
    Main -->|agent tool_use| Dispatch{type?}
    Dispatch -->|explore| Explore[Explore Sub-Agent<br/>Read-only · Fast search]
    Dispatch -->|plan| Plan[Plan Sub-Agent<br/>Read-only · Structured planning]
    Dispatch -->|general| General[General Sub-Agent<br/>Full tool set]

    Explore --> Result[Return text result]
    Plan --> Result
    General --> Result
    Result --> Main

    subgraph Sub-Agent Sandbox
        Explore
        Plan
        General
    end

    style Main fill:#7c5cfc,color:#fff
    style Dispatch fill:#e8e0ff
    style Result fill:#e8e0ff

How Claude Code Does It

Claude Code’s multi-agent system is implemented in src/tools/AgentTool/, supporting three collaboration modes:

ModeCharacteristics
Sub-Agent (fork-return)Forks to execute independently, returns result on completion
CoordinatorA coordinator assigns tasks to multiple Workers
Swarm TeamMultiple Agents collaborate as peers, communicating via mailboxes

We implement the Sub-Agent mode, which is also the most commonly used.

Built-in Agent Types

  • Explore: Uses Haiku model (cheaper), read-only tool set, specialized for code search
  • Plan: Read-only + structured output, designs implementation plans
  • General: Full tool set (except it cannot recursively create sub-agents)
  • Custom: Defined via .claude/agents/*.md files

Key Design of Coordinator Mode

Coordinator turns the main Agent into a pure orchestrator — its tool set is hard-limited to only Agent (spawn Workers) and SendMessage (continue a Worker), with absolutely no ability to perform file operations. This hard constraint prevents the coordinator from “being too lazy to delegate and doing it itself,” which would cause it to degrade into a regular single Agent.

The standard workflow has four phases: Research (parallel, read-only) Synthesize (coordinator, serial comprehension) Implement (serial, by file set) Verify.

The synthesis phase has a counter-intuitive constraint: the prompt explicitly forbids writing “based on your findings.” This forces the coordinator to genuinely understand and make research results concrete (including file paths, line numbers), rather than passing the comprehension work to the next Worker.

Each Worker is an independent Agent starting from scratch that cannot see the coordinator’s conversation with the user, so the prompt the coordinator writes for Workers must be self-contained — this is the biggest pitfall in Coordinator mode.

Tool Filtering: 4-Layer Pipeline

Sub-agent tool access goes through a 4-layer filter, implementing defense in depth:

  1. Remove meta-tools (TaskOutput, EnterPlanMode, AskUserQuestion, etc.) — sub-agents should not control Agent execution flow
  2. Additional restrictions for custom Agents — user-defined types don’t get the same trust level as built-in types
  3. Async Agents use a whitelist mode — background execution can’t display interactive UI, requiring strict limits
  4. Agent-type-level disallowedTools — e.g., Explore explicitly excludes write tools

The first three layers are global policies; the fourth is type-level policy. Even if a custom Agent sets disallowedTools: [], the first three layers still apply.

Context Isolation

Sub-agents use deny-by-default: message history is completely independent, abortController propagates one-way (parent abort child abort, but not the reverse), and sub-agent state changes don’t propagate to the parent UI by default. There’s only one exception: background processes started by Bash must be registered in the root store, or they become zombie processes.

Worktree Isolation

When multiple Agents write files in parallel, Claude Code assigns each writing Agent an independent Git Worktree — sharing the .git directory but with independent working directories, completely conflict-free, with much less overhead than git clone.

Our Implementation

With ~199 lines in subagent.ts plus minor changes to the Agent class, we implement the core of the Sub-Agent pattern.

Claude CodeOur ImplementationSimplification Reason
5-stage execution pipelineDirect new Agent + runOnceNo need for fork processes, cache sharing
4-layer tool filter pipeline1 Set + filterOnly 3 fixed types
Haiku model for ExploreUnified main modelReduces configuration complexity
deny-by-default context isolationNatural isolation (independent Agent instances)new Agent comes with independent message history

Key Code

1. Agent Type Configuration — subagent.ts

TypeScript

export type SubAgentType = "explore" | "plan" | "general";
 
const READ_ONLY_TOOLS = new Set([
  "read_file", "list_files", "grep_search", "run_shell"
]);
 
function getReadOnlyTools(): ToolDef[] {
  return toolDefinitions.filter((t) => READ_ONLY_TOOLS.has(t.name));
}

Python

READ_ONLY_TOOLS = {"read_file", "list_files", "grep_search"}
 
def _get_read_only_tools() -> list[ToolDef]:
    return [t for t in tool_definitions if t["name"] in READ_ONLY_TOOLS]

Why is run_shell in the “read-only” tool set? Read-only commands like git log, find, and wc are essential for code exploration; completely prohibiting shell would severely weaken Explore’s capabilities. Safety is ensured through system prompt constraints:

TypeScript

const EXPLORE_PROMPT = `You are an Explore agent — a fast, READ-ONLY sub-agent...
 
IMPORTANT CONSTRAINTS:
- You are READ-ONLY. Do NOT modify any files.
- If using run_shell, only use read commands (ls, cat, find, grep, git log, etc.)
- Do NOT use write, edit, rm, mv, or any destructive shell commands.
 
Be fast and thorough. Use multiple tool calls when possible.
Return a concise summary of your findings.`;

Python

EXPLORE_PROMPT = """You are an Explore agent — a fast, READ-ONLY sub-agent specialized for codebase exploration.
 
IMPORTANT CONSTRAINTS:
- You are READ-ONLY. You only have access to read_file, list_files, and grep_search.
- Do NOT attempt to modify any files.
 
Be fast and thorough. Use multiple tool calls when possible. Return a concise summary of your findings."""

The Plan Agent is also read-only, but its prompt guides it to produce structured plans:

TypeScript

const PLAN_PROMPT = `You are a Plan agent — a READ-ONLY sub-agent specialized for designing implementation plans.
 
Your job:
- Analyze the codebase to understand the current architecture
- Design a step-by-step implementation plan
- Identify critical files that need modification
- Consider architectural trade-offs
 
Return a structured plan with:
1. Summary of current state
2. Step-by-step implementation steps
3. Critical files for implementation
4. Potential risks or considerations`;

Python

PLAN_PROMPT = """You are a Plan agent — a READ-ONLY sub-agent specialized for designing implementation plans.
 
Return a structured plan with:
1. Summary of current state
2. Step-by-step implementation steps
3. Critical files for implementation
4. Potential risks or considerations"""

The General Agent gets all tools except agent:

TypeScript

const GENERAL_PROMPT = `You are a General sub-agent handling an independent task.
Complete the assigned task and return a concise result. You have access to all tools.`;
 
export function getSubAgentConfig(type: SubAgentType): SubAgentConfig {
  // Check custom agents first
  const custom = discoverCustomAgents().get(type);
  if (custom) {
    const tools = custom.allowedTools
      ? toolDefinitions.filter(t => custom.allowedTools!.includes(t.name))
      : toolDefinitions.filter(t => t.name !== "agent");
    return { systemPrompt: custom.systemPrompt, tools };
  }
  switch (type) {
    case "explore":
      return { systemPrompt: EXPLORE_PROMPT, tools: getReadOnlyTools() };
    case "plan":
      return { systemPrompt: PLAN_PROMPT, tools: getReadOnlyTools() };
    case "general":
      return {
        systemPrompt: GENERAL_PROMPT,
        tools: toolDefinitions.filter((t) => t.name !== "agent"),
      };
  }
}

Python

GENERAL_PROMPT = "You are a General sub-agent handling an independent task. Complete the assigned task and return a concise result. You have access to all tools."
 
def get_sub_agent_config(agent_type: str) -> dict:
    custom = _discover_custom_agents().get(agent_type)
    if custom:
        if custom["allowed_tools"]:
            tools = [t for t in tool_definitions if t["name"] in custom["allowed_tools"]]
        else:
            tools = [t for t in tool_definitions if t["name"] != "agent"]
        return {"system_prompt": custom["system_prompt"], "tools": tools}
 
    read_only = [t for t in tool_definitions if t["name"] in READ_ONLY_TOOLS]
    if agent_type == "explore":
        return {"system_prompt": EXPLORE_PROMPT, "tools": read_only}
    elif agent_type == "plan":
        return {"system_prompt": PLAN_PROMPT, "tools": read_only}
    else:
        return {"system_prompt": GENERAL_PROMPT, "tools": [t for t in tool_definitions if t["name"] != "agent"]}

2. Agent Tool Definition — tools.ts

agent is registered as a regular tool. type is not required — when the LLM is unsure, it can omit it and fall back to general:

TypeScript

{
  name: "agent",
  description:
    "Launch a sub-agent to handle a task autonomously. Sub-agents have isolated context " +
    "and return their result. Types: 'explore' (read-only, fast search), " +
    "'plan' (read-only, structured planning), 'general' (full tools).",
  input_schema: {
    type: "object",
    properties: {
      description: { type: "string", description: "Short (3-5 word) description of the sub-agent's task" },
      prompt: { type: "string", description: "Detailed task instructions for the sub-agent" },
      type: {
        type: "string",
        enum: ["explore", "plan", "general"],
        description: "Agent type. Default: general",
      },
    },
    required: ["description", "prompt"],
  },
}

Python

{
    "name": "agent",
    "description": "Launch a sub-agent to handle a task autonomously. Types: 'explore' (read-only), 'plan' (read-only, structured planning), 'general' (full tools).",
    "input_schema": {
        "type": "object",
        "properties": {
            "description": {"type": "string", "description": "Short (3-5 word) description of the sub-agent's task"},
            "prompt": {"type": "string", "description": "Detailed task instructions for the sub-agent"},
            "type": {"type": "string", "enum": ["explore", "plan", "general"], "description": "Agent type. Default: general"},
        },
        "required": ["description", "prompt"],
    },
}

3. Agent Class Modifications — agent.ts

Only 4 changes are needed to make the same Agent class serve both the main Agent and sub-agents.

3a. Constructor: Accept Custom Configuration

TypeScript

interface AgentOptions {
  // ...
  customSystemPrompt?: string;
  customTools?: ToolDef[];
  isSubAgent?: boolean;
}
 
constructor(options: AgentOptions = {}) {
  this.isSubAgent = options.isSubAgent || false;
  this.tools = options.customTools || toolDefinitions;
  this.systemPrompt = options.customSystemPrompt || buildSystemPrompt();
  // ...
}

Python

class Agent:
    def __init__(
        self,
        *,
        # ...
        custom_system_prompt: str | None = None,
        custom_tools: list[ToolDef] | None = None,
        is_sub_agent: bool = False,
    ):
        self.is_sub_agent = is_sub_agent
        self.tools = custom_tools or tool_definitions
        self._base_system_prompt = custom_system_prompt or build_system_prompt()

When customTools is None, it falls back to the full tool list, with zero impact on the main Agent.

3b. Output Capture: emitText + outputBuffer

Sub-agent text output can’t be printed directly; it needs to be collected and returned to the main Agent:

TypeScript

private outputBuffer: string[] | null = null;
 
private emitText(text: string): void {
  if (this.outputBuffer) {
    this.outputBuffer.push(text);   // Sub-agent: collect
  } else {
    printAssistantText(text);        // Main Agent: print directly
  }
}

Python

self._output_buffer: list[str] | None = None
 
def _emit_text(self, text: str) -> None:
    if self._output_buffer is not None:
        self._output_buffer.append(text)
    else:
        print_assistant_text(text)

outputBuffer has three states: null = main Agent mode (print directly), [] = sub-agent mode (start collecting), [...] = accumulating. The streaming callback only needs to call emitText, completely unaware of which mode it’s running in.

3c. runOnce: One-Shot Execution Entry Point

TypeScript

async runOnce(prompt: string): Promise<{ text: string; tokens: { input: number; output: number } }> {
  this.outputBuffer = [];
  const prevInput = this.totalInputTokens;
  const prevOutput = this.totalOutputTokens;
  await this.chat(prompt);                         // Reuse the full agent loop
  const text = this.outputBuffer.join("");
  this.outputBuffer = null;
  return {
    text,
    tokens: {
      input: this.totalInputTokens - prevInput,
      output: this.totalOutputTokens - prevOutput,
    },
  };
}

Python

async def run_once(self, prompt: str) -> dict:
    self._output_buffer = []
    prev_in = self.total_input_tokens
    prev_out = self.total_output_tokens
    await self.chat(prompt)
    text = "".join(self._output_buffer)
    self._output_buffer = None
    return {
        "text": text,
        "tokens": {
            "input": self.total_input_tokens - prev_in,
            "output": self.total_output_tokens - prev_out,
        },
    }

Tokens are calculated incrementally (post-run minus pre-run) because the Agent instance’s counters are cumulative. chat() is fully reused — it doesn’t care whether it’s running in the main Agent or a sub-agent, since the tool set and output destination were already configured in the constructor.

3d. executeAgentTool: Execute Sub-Agent

TypeScript

private async executeAgentTool(input: Record<string, any>): Promise<string> {
  const type = (input.type || "general") as SubAgentType;
  const description = input.description || "sub-agent task";
  const prompt = input.prompt || "";
 
  printSubAgentStart(type, description);
 
  const config = getSubAgentConfig(type);
  const subAgent = new Agent({
    model: this.model,
    customSystemPrompt: config.systemPrompt,
    customTools: config.tools,
    isSubAgent: true,
    permissionMode: this.permissionMode === "plan" ? "plan" : "bypassPermissions",
  });
 
  try {
    const result = await subAgent.runOnce(prompt);
    this.totalInputTokens += result.tokens.input;
    this.totalOutputTokens += result.tokens.output;
    printSubAgentEnd(type, description);
    return result.text || "(Sub-agent produced no output)";
  } catch (e: any) {
    printSubAgentEnd(type, description);
    return `Sub-agent error: ${e.message}`;
  }
}

Python

async def _execute_agent_tool(self, inp: dict) -> str:
    agent_type = inp.get("type", "general")
    description = inp.get("description", "sub-agent task")
    prompt = inp.get("prompt", "")
 
    print_sub_agent_start(agent_type, description)
 
    config = get_sub_agent_config(agent_type)
    sub_agent = Agent(
        model=self.model,
        custom_system_prompt=config["system_prompt"],
        custom_tools=config["tools"],
        is_sub_agent=True,
        permission_mode="plan" if self.permission_mode == "plan" else "bypassPermissions",
    )
 
    try:
        result = await sub_agent.run_once(prompt)
        self.total_input_tokens += result["tokens"]["input"]
        self.total_output_tokens += result["tokens"]["output"]
        print_sub_agent_end(agent_type, description)
        return result["text"] or "(Sub-agent produced no output)"
    except Exception as e:
        print_sub_agent_end(agent_type, description)
        return f"Sub-agent error: {e}"

When a sub-agent errors, it returns an error string rather than crashing the parent Agent — the parent Agent’s LLM sees the error message and can decide on its own whether to retry or try a different strategy.

Permission inheritance: Sub-agents default to bypassPermissions (the main Agent has already been authorized, so sub-agents don’t need to ask the user again), but Plan Mode must be inherited — otherwise sub-agents could bypass the read-only restriction, which would be a security hole.

The agent tool requires special dispatch because it needs access to the current Agent instance’s state (model, permissionMode, token counters) and can’t go through the stateless generic dispatch function:

TypeScript

private async executeToolCall(name: string, input: Record<string, any>): Promise<string> {
  if (name === "agent") {
    return this.executeAgentTool(input);
  }
  return executeTool(name, input);
}

Python

async def _execute_tool_call(self, name: str, inp: dict) -> str:
    if name == "agent":
        return await self._execute_agent_tool(inp)
    if name == "skill":
        return await self._execute_skill_tool(inp)
    return await execute_tool(name, inp)

4. The isSubAgent Flag

Sub-agents skip three operations that are only meaningful for the main Agent:

TypeScript

if (!this.isSubAgent) {
  printDivider();
  this.autoSave();
}
 
if (!this.isSubAgent) {
  printCost(this.totalInputTokens, this.totalOutputTokens);
}

Python

if not self.is_sub_agent:
    print_divider()
    self._auto_save()
 
if not self.is_sub_agent:
    print_cost(self.total_input_tokens, self.total_output_tokens)
  • Dividers: Sub-agent output is captured by the buffer and won’t appear in the terminal
  • Session saving: Sub-agents are one-time tasks; saving their session is pointless and could overwrite the main Agent’s file
  • Cost printing: Tokens are already aggregated to the parent Agent; sub-agents printing their own cost would create a false impression of double billing

5. Terminal UI — ui.ts

TypeScript

export function printSubAgentStart(type: string, description: string) {
  console.log(chalk.magenta(`\n  ┌─ Sub-agent [${type}]: ${description}`));
}
 
export function printSubAgentEnd(type: string, description: string) {
  console.log(chalk.magenta(`  └─ Sub-agent [${type}] completed`));
}

Python

def print_sub_agent_start(agent_type: str, description: str) -> None:
    console.print(f"\n  [magenta]┌─ Sub-agent [{agent_type}]: {description}[/magenta]")
 
def print_sub_agent_end(agent_type: str, _description: str) -> None:
    console.print(f"  [magenta]└─ Sub-agent [{agent_type}] completed[/magenta]")

6. Custom Agent Types: .claude/agents/*.md

An extension mechanism identical to Claude Code’s .claude/agents/:

<!-- .claude/agents/reviewer.md -->
---
name: reviewer
description: Reviews code for bugs and style issues
allowed-tools: read_file, list_files, grep_search, run_shell
---
You are a code reviewer. Analyze the code thoroughly and report:
1. Bugs and potential issues
2. Style inconsistencies
3. Performance concerns

Discovery mechanism: Project-level (.claude/agents/) has higher priority than user-level (~/.claude/agents/), with same-name override. Frontmatter reuses parseFrontmatter(), sharing the same parser with Memory and Skills.

Key Design Decisions

Why Is Fork-Return a Better Starting Point Than Coordinator?

Fork-return’s advantages are simple: no shared state (impossible to pollute the main Agent’s context), deterministic control flow (send request, wait for result), and simple fault tolerance (sub-agent errors, main Agent keeps working). Coordinator is stronger at task parallelization but requires handling information sharing between Workers, conflict resolution — an order of magnitude more complex.

Why Can’t Sub-Agents Create Sub-Agents?

The General Agent’s tool list filters out agent. Without this restriction, recursive nesting of A creating B, B creating C would consume tokens exponentially — each level has its own system prompt and message history. Claude Code has the same restriction; in practice, 1 level covers the vast majority of scenarios.

Why Do Explore/Plan Keep run_shell?

Read-only shell commands like git log --oneline -20 and find . -name "*.ts" | wc -l are essential for code exploration; completely prohibiting them would severely weaken capabilities. This design aligns with Claude Code’s Explore Agent — constrained via system prompt rather than completely disabling the tool.

Why Use a Buffer to Collect Output Instead of Callbacks?

A callback approach would require passing onText into the constructor and adding checks throughout the agent loop. The buffer approach only modifies emitText in one place: runOnce opens it, chat writes to it, runOnce collects and closes it. The lifecycle boundaries are clear, with zero impact on existing code.


The core insight of the entire implementation: a sub-agent is essentially just an Agent instance with different configuration. By adding a few optional parameters to the Agent class (customTools, customSystemPrompt, isSubAgent), the same agent loop serves both the main Agent and sub-agents, avoiding code duplication.

Next chapter: Connecting the Agent to external tool servers — MCP integration.