# How Claude Code Works — Agent Optimization Guide

> Criado por [Thiago Thomaz](https://thiagothomaz.com) a partir de análise estática da codebase do Claude Code (versão 2.1.88, vazada acidentalmente em março de 2026). Gerado pelo próprio Claude Code com leitura estática dos 1.900 arquivos TypeScript.

---

## 1. Internal Architecture Relevant to Agents

### Main loop: three nested layers

```
QueryEngine.submitMessage()           ← public entry point; 1 instance per session
  └── query()                         ← API loop; repeats until stop_reason=end_turn
        └── runTools() → runToolUse() ← executes tool_use blocks from the response
```

**QueryEngine** (`QueryEngine.ts`) is the orchestrator. It:

- Assembles the system prompt once per query (expensive — includes git status)
- Maintains `messages: Message[]` as immutable canonical state
- Passes to `query()` in a loop — each tool result triggers a new API turn

**query()** (`query.ts`):

- Streams from the API via `@anthropic-ai/sdk`
- On receiving a complete `tool_use` block, invokes `runTools()` without waiting for `stop_reason`
- Applies `applyToolResultBudget()` before sending results back — large results are truncated/persisted to disk

**Implication for agents**: every time a subagent is created via `AgentTool`, it receives a full fork of the parent's context — including message history. The cost of spawning a subagent is proportional to the current context size.

### Task / subagent types

`Task.ts` defines the available types:

| Type                  | ID Prefix | Use                                        |
| --------------------- | --------- | ------------------------------------------ |
| `local_agent`         | `a`       | Subagent via AgentTool on the same machine |
| `remote_agent`        | `r`       | Remote (headless) agent                    |
| `in_process_teammate` | `t`       | Teammate running in-process                |
| `local_workflow`      | `w`       | Local workflow                             |
| `local_bash`          | `b`       | Shell task                                 |
| `monitor_mcp`         | `m`       | MCP monitor                                |
| `dream`               | `d`       | Speculative                                |

`isTerminalTaskStatus()` defines terminal states: `completed`, `failed`, `killed`. An agent in a terminal state accepts no further messages.

### Subagent forking

`forkSubagent.ts` clones the parent context with one critical detail:

```ts
renderedSystemPrompt?: SystemPrompt  // frozen at fork time
```

The system prompt is **frozen at fork time** to prevent cache divergence with GrowthBook (feature flags can change between calls). Rebuilding the prompt at fork time breaks the prompt cache.

---

## 2. How Context Is Managed and What That Means for Prompt Design

### What composes the system prompt

```ts
systemPrompt = [
  customSystemPrompt ?? defaultSystemPrompt,   // full replacement or default
  memoryMechanicsPrompt?,                       // persistent memory (if active)
  appendSystemPrompt?,                          // appended without replacing the default
]
```

`appendSystemPrompt` is the correct extension point for adding instructions without losing default behavior. Using `customSystemPrompt` eliminates all native instructions (tools, formatting, etc.).

### User context (injected per turn)

`context.ts` injects via `getUserContext()` (memoized):

- Contents of `CLAUDE.md` files found in the directory hierarchy
- Current date

`getSystemContext()` (memoized):

- Git status + branch + recent commits (max 2k chars, truncated)
- Cache breaker (to detect prompt cache breaks)

**Important**: git status consumes tokens on every turn. In large repos with many modified files, this field alone can consume hundreds of tokens per request.

### Message normalization before the API

`normalizeMessagesForAPI()` strips these fields before sending:

- `toolUseResult` (message wrapper — different from the `tool_result` block)
- `isVisibleInTranscriptOnly`
- `isMeta`
- `imagePasteIds`
- `isSynthetic`
- `sourceToolAssistantUUID`
- `isCompactSummary`

These fields exist only locally (UI, tracking). The API never sees them.

### Token counting — how the system measures

`tokenCountWithEstimation(messages)` — canonical function for threshold decisions:

1. Finds the last message with `usage` data (API response)
2. Walks back to the first sibling (same `message.id`)
3. Returns `usage tokens + rough estimate of newer messages`

Avoids undercounting with parallel tool calls where multiple messages share the same usage response.

**For remaining budget calculation** (`task_budget.remaining`):

```ts
finalContextTokensFromLastResponse() {
  // uses iterations[-1] if available, otherwise input+output
  // does NOT include cache tokens — aligns with server-side formula
}
```

Cache tokens (`cache_creation_input_tokens`, `cache_read_input_tokens`) count toward the autocompact threshold but **are not billed as input tokens**.

---

## 3. How Tools Are Orchestrated — Patterns That Reduce Latency

### Batching by `isConcurrencySafe`

`toolOrchestration.ts` partitions via `partitionToolCalls()`:

```
[Read, Glob, Grep, Edit, Read] →
  Batch 1: [Read, Glob, Grep]  ← isConcurrencySafe=true → runs in parallel
  Batch 2: [Edit]              ← isConcurrencySafe=false → runs alone
  Batch 3: [Read]              ← isConcurrencySafe=true → runs in parallel
```

The logic: consecutive `safe` tools form a batch; any `!safe` tool breaks the sequence and becomes its own batch.

**Concurrency limit**: `CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY` (default: 10).

**Design implication**: when building agents with custom tools, marking `isConcurrencySafe = true` on pure-read tools allows the model to run multiple lookups in parallel with no added latency cost.

### Context modifiers — serializing side effects

Tools that modify context (`!isConcurrencySafe`) return a `contextModifier`:

```ts
contextModifier?: (context: ToolUseContext) => ToolUseContext
```

For concurrent tools, modifiers are **queued** and applied sequentially after the full batch completes — in the order of the original blocks, not completion order. This prevents race conditions but means concurrent tools cannot "see" each other's side effects.

### Deferred tools and ToolSearch

Tools with `shouldDefer = true` are sent to the API with `defer_loading: true`. The model must use `ToolSearch` before it can call them.

Tools with `alwaysLoad = true` always appear in the initial prompt, even when ToolSearch is active.

**Impact**: deferred tools reduce the initial prompt size (schema is omitted), but add one extra `ToolSearch` round-trip before the first use. For rarely-called tools in long sessions, the trade-off is favorable.

### `maxResultSizeChars` — when results go to disk

Each tool defines a threshold. If the result exceeds it, it is saved to a file and the model receives a preview with the path. Tools like `Read` have `maxResultSizeChars = Infinity` because they already have their own internal limits (a `Read → file → Read` loop would create infinite recursion).

### `interruptBehavior`

```ts
interruptBehavior?(): 'cancel' | 'block'
```

- `'cancel'`: tool is discarded if the user sends a new message while it is running
- `'block'`: the new message waits for the tool to complete (default)

---

## 4. Practices That Reduce Token Consumption Without Losing Quality

### Prompt caching — how it works internally

The system uses `cache_control` markers in the system prompt at `SYSTEM_PROMPT_DYNAMIC_BOUNDARY`. The static portion (tools, base instructions) is marked for caching; the dynamic portion (git status, date) comes after the boundary.

**Recommendation**: keep as much content as possible before the dynamic boundary. Any change after the boundary invalidates the cache only from that position forward.

Cache breaks are detected in `promptCacheBreakDetection.ts` — compactions notify this service to avoid false positives in metrics.

### Autocompact — real thresholds

From `autoCompact.ts`:

```ts
AUTOCOMPACT_BUFFER_TOKENS = 13_000; // triggers autocompact
WARNING_THRESHOLD_BUFFER_TOKENS = 20_000;
ERROR_THRESHOLD_BUFFER_TOKENS = 20_000;
MANUAL_COMPACT_BUFFER_TOKENS = 3_000; // blocking limit
MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000; // reserved for compaction output
```

**Threshold formula**:

```
effectiveContextWindow = modelContextWindow - min(maxOutputTokens, 20000)
autocompactThreshold = effectiveContextWindow - 13000
```

**Circuit breaker**: after 3 consecutive autocompact failures, the system stops trying. A session with an irrecoverably large context will not waste API calls indefinitely.

**Compaction order**: the system tries `sessionMemoryCompaction` first (lighter — selective pruning) before `compactConversation` (full LLM-generated summary).

### Tool result microcompaction

`microCompact.ts` defines which tools have compactable results:

```ts
const COMPACTABLE_TOOLS = new Set([
  FILE_READ_TOOL_NAME,
  SHELL_TOOL_NAMES,
  GREP_TOOL_NAME,
  GLOB_TOOL_NAME,
  WEB_SEARCH_TOOL_NAME,
  WEB_FETCH_TOOL_NAME,
  FILE_EDIT_TOOL_NAME,
  FILE_WRITE_TOOL_NAME,
]);
```

Results from tools in these groups can be cleared mid-conversation via `TIME_BASED_MC_CLEARED_MESSAGE` when they become stale in the context.

### `backfillObservableInput` — zero cache cost

```ts
backfillObservableInput?(input: Record<string, unknown>): void
```

This method can add derived fields to the input **after** the API has received the original. The API never sees the added fields — only hooks and observers do. This prevents backwards-compatibility fields from busting the prompt cache.

---

## 5. Anti-Patterns Found in the Codebase — What to Avoid

### 1. Spawning subagents with a bloated context

`AgentTool` forks the full message history. Subagents created late in a long session carry the entire history as their initial context. The cost is immediate and non-obvious.

**Correct**: Create subagents early, with minimal context. Pass only what is needed via a concise `customSystemPrompt` or initial message.

### 2. Unnecessarily mixing safe and unsafe tools

```
[Read, Edit, Read, Read]
→ Batch 1: [Read]       ← parallel, but alone
→ Batch 2: [Edit]       ← serial
→ Batch 3: [Read, Read] ← parallel
```

One `Edit` in the middle of multiple reads forces 3 round-trips where 2 would suffice. The model does not decide the order — order is determined by the response output.

**Correct**: Group all required reads before writes in a single response. This requires clear prompt instructions to plan before acting.

### 3. Using `customSystemPrompt` to replace the default

Using `customSystemPrompt` completely eliminates the default system prompt — including tool descriptions and usage instructions. The model loses its knowledge of how to use the tools.

**Correct**: Use `appendSystemPrompt` to add instructions without losing base behavior.

### 4. Not declaring `querySource` in forked agents

`shouldAutoCompact` uses `querySource` to prevent recursion (`session_memory`, `compact`, and `marble_origami` are guarded). A forked agent that does not declare `querySource` can trigger autocompact inside autocompact.

**Correct**: Always declare `querySource` in agents with a specific purpose.

### 5. Assuming feature flags are real-time

GrowthBook flags are cached via `getFeatureValue_CACHED_MAY_BE_STALE`. Critical decisions based on feature flags may use stale values. The system does this intentionally (e.g., tool schemas are cached per session to avoid cache churn).

### 6. Synchronous hooks on critical paths

Pre/post tool hooks run before/after every tool. Slow hooks (HTTP, shell) block the tool execution pipeline. The system supports async hooks via `AsyncHookRegistry`, but the result is still awaited before continuing.

### 7. Large tool results without `maxResultSizeChars`

Without defining `maxResultSizeChars`, large results are included inline in the context. For tools that return variable-size content (search, web fetch), a single call can consume the entire context window.

---

## 6. Practical Recommendations for Agents in Production

### Set concurrency appropriately

```bash
CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY=5  # reduce for rate-limited environments
```

The default of 10 is aggressive. On APIs with per-minute rate limits, reducing to 3–5 cuts 429 errors without significant latency impact.

### Monitor tokens with the correct formula

Do not rely on `usage.input_tokens` alone to calculate remaining context. Use:

```
currentContext = input_tokens + output_tokens + cache_creation_input_tokens
                (but do NOT subtract cache_read_input_tokens from the limit)
```

Cache read tokens are "free" in cost but still occupy context — the model sees the content even without paying for it.

### Use a frozen `renderedSystemPrompt` in subagents

When creating subagents that need to share cache with the parent:

```ts
renderedSystemPrompt: parentContext.renderedSystemPrompt;
```

This ensures the system prompt hash is identical to the parent's and the cache is shared. Without this, any GrowthBook divergence causes a cache miss.

### Structure tools with `alwaysLoad` and `shouldDefer` correctly

- **`alwaysLoad = true`**: tools used on the first turn or at high frequency
- **`shouldDefer = true`**: specialized tools used rarely

An agent with 30+ tools should defer 80% of them. The extra `ToolSearch` round-trip is ~200ms — less than the token cost of including 25 full schemas in the prompt.

### Treat the autocompact circuit breaker as a signal

If the system has stopped attempting autocompact after 3 failures (`consecutiveFailures >= 3`), the session is in a degraded state. The context may be above the limit and no compaction will succeed. At that point: end the session and start fresh with a clean context.

### Async hooks for non-blocking validations

`AsyncHookRegistry` allows registering hooks that run in the background. For security validations that do not need to block execution (logging, auditing), async hooks avoid the latency penalty.

### Avoid oversized CLAUDE.md files

CLAUDE.md content is injected on every turn via `getUserContext()`. A 10k-token CLAUDE.md consumes 10k tokens on every request in the session. For long documentation, use referenced files that the model reads on demand.

### `criticalSystemReminder_EXPERIMENTAL`

The field `ToolUseContext.criticalSystemReminder_EXPERIMENTAL` exists as an injection point for critical instructions in the tool execution context. It is experimental and may be removed — but it signals that the system has a per-tool-invocation override mechanism that bypasses the normal system prompt.
