Inside Claude Code (1): The Harness

img_0311

On March 31, 2026, Chaofan Shou (@Fried_rice) posted on X that “Claude code source code has been leaked via a map file in their npm registry!” The post included a direct link to the exposed source archive and screenshots of the unpacked codebase, and it quickly went viral, reaching 34.6M views. The attention around it was immediate: Anthropic’s official Claude Code repo was already at roughly 84.6K GitHub stars by March 30, and one of the first public source mirrors quickly passed 13K stars on April 1. The leak made Claude Code especially interesting to analyze, because it turned the tool into one of the clearest public examples of how a serious long-running coding agent is actually built.

I immediately forked the repo and started digging into it with a pretty simple question in mind: how is Claude Code actually built under the hood? I was especially interested in the overall architecture, the agent harness, and the context-management system that lets it work across long-running sessions both on their own terms and in comparison with my own T-1K agent architecture, which I described in my last article.

This article is the first in a short series analyzing Claude Code’s architecture. It begins with the harness, because that is the clearest entry point into the runtime. The following parts will examine context management and memory, so the series will move from the execution structure itself to the way the system handles information across turns and across sessions.

The architectural center

The right mental model for Claude Code is not “chat plus tools”, but a controlled execution harness for iterative coding work.

Its architectural core is the machinery that assembles context, executes actions, applies policy, preserves state, and keeps the session usable as it grows. The interface exposes the current state of that session.

Claude Code is built around a single execution architecture that can operate in two different ways. In both cases, a session layer preserves continuity across turns, while an execution loop drives the interaction between the model, tools, context, and stopping conditions.

What changes is the role of the top-level agent inside that loop.

In the first mode, the top-level agent is the direct operator. It reads files, uses tools, reacts to outputs, and works through the request itself inside a ReAct-style loop. In practice, this means the model alternates between reasoning and acting: it inspects the current state, decides whether a tool is needed, executes the action, incorporates the result, and continues from the updated context. Rather than producing a full answer in one shot, it advances iteratively until the task reaches a stable result.

In the second mode, the top-level agent becomes a coordinator. The same outer harness remains in place, but execution is delegated to worker agents, and the top-level agent focuses on planning, synthesis, and control. Instead of carrying out most actions itself, it interprets the user’s goal, decides what work should be delegated, launches one or more workers, receives their results, and determines what should happen next. Its role is therefore to maintain direction across the task rather than to perform every step directly. Workers handle the concrete operational work, while the coordinator keeps ownership of decomposition, routing, evaluation, and the final user-facing response.

So the key distinction is not between two different runtimes, but between two ways of using the same harness: direct execution and orchestrated execution.

Relative to the architecture of T1-K described in my previous article, Claude Code looks familiar in many of its deepest design choices. The control loop stays close to the model, tools are treated as structured primitives, and permissions are enforced through real capability boundaries rather than prompt text alone. The major differences are that Claude Code does not appear to be organized around an explicit scenario-building layer, and that even in orchestrated mode it does not isolate assessment as a first-class runtime role. It is therefore close to T1-K, but not identical in how it separates its roles.

I will start with the direct execution mode, because it shows the harness in its simplest form, and then move on to the orchestrated execution mode, highlighting the differences.

The direct execution mode

At the center is a session controller that preserves continuity across turns and an execution loop that repeatedly drives the model, tools, and control flow. Around that core sits a context system. Supporting both is a long-term memory layer that carries reusable knowledge across sessions, plus compaction and persistence mechanisms that keep long-running work usable, resumable, and within context limits.

Layer	What it does	What it handles
Session controller	Keeps one coding session coherent from beginning to end	conversation history, turn-to-turn state, persistence, final result framing
Baseline context builder	Prepares the initial context before the agent starts working	repository snapshot, current date, baseline instructions, session state brought back into context
Instructions system	Finds the instructions that should guide the work	global instructions, user instructions, project instructions, local instructions, included files, directory-specific rules
Execution loop	Runs the live work cycle until the task is finished	model calls, tool execution, continue/stop decisions
Reactive context loader	Adds extra context when execution reveals it is needed	newly relevant instructions, reminders, deltas, nested context packets, retrieved memory
Compaction and recovery	Keeps the session usable when context grows too large or a turn fails	summarization, truncation, retry paths, overflow handling
Long-term memory system	Stores knowledge outside the current session so it can be reused later	reusable facts, prior knowledge, retained preferences, persistent memory artifacts
Session history and persistence	Saves the running session so it can be resumed and reused	transcripts, checkpoints, persisted session state

The harness

The harness is the runtime structure that manages how a user request moves through the system. It organizes the interaction between the conversation context, the model, the available tools, and the final response.

As the diagram shows, the flow begins when a user prompt enters the session controller. From there, the system prepares the turn through a baseline context builder fed by the instructions system and by persisted session state. This produces the initial working view that will be handed to execution.

Once that setup is complete, the execution loop takes over. It runs the active exchange for that request: the model responds, tools may be used, additional context may be loaded when needed, and recovery mechanisms intervene if the turn runs into routine limits such as overflow or interruption. In parallel, session history, persistence, and memory components keep the run resumable and allow relevant information to be brought back into context as execution continues.

For example, imagine a chat where the user first says, “Read this document and summarize it.” The system opens a loop for that request, reads the document, produces the summary, and then stops. The summary, the document context, and everything learned during that exchange remain stored in the session state.

Then the user asks, “Now extract the risks mentioned in section 3.” A new execution loop starts for this new interaction, but it runs on top of the existing session state, which already remembers the document, the previous summary, and the broader conversation context.

Why this split matters

The split matters because each layer has a different role. The session engine keeps everything that must persist across the conversation, such as the history and other saved context. The turn loop handles the work of one user request, running the cycle of response, tool use, and updates until that request is complete. This separation brings three benefits.

First, it separates memory from action. One layer keeps the session stable; the other focuses on finishing the current task.
Second, it improves stability. The long-lived session logic does not have to change every time the inner execution cycle is optimized.
Third, it improves visibility. Refusals, retries, stop conditions, recovery steps, and usage are explicit parts of the runtime rather than hidden side effects.

Figure 2. The outer layer (QueryEngine) preserves session context and framing; the inner layer (Query) runs the repeated cycle of model response, tool use, added context, and continuation.

What the session engine actually does

When a new message begins, the session engine does more than forward the request. It resets anything that only applies to the current turn, determines the active configuration, gathers the right context, frames the request, records the user message, and then hands control to the turn loop.

When the loop finishes, the session engine remains responsible for framing the result, handling replay or recovery when needed, and updating the session state.

The key point is that this outer layer is a real supervisory layer: it prepares the environment for the turn, supervises execution, and folds the outcome back into the ongoing session.

What the turn loop actually does

The turn loop is best understood as a ReAct-style agent loop, not as a single model call. In simple terms, this means the system does not try to solve the whole request in one step. Instead, within one user interaction, it can generate a response, decide whether an external action is needed, execute that action, incorporate the result, and continue from the updated state until the request reaches a stopping point.

At each pass, it can rebuild a smaller working view of the conversation, select the active model configuration, stream output, handle tool-use requests, and feed the results back into context. The turn therefore unfolds as a repeated cycle of produce, act, update, and continue.

The loop also includes explicit recovery paths for routine runtime constraints such as context pressure or output limits. Recovery is therefore part of normal execution, not an exceptional case. Taken together, this makes the turn loop a genuine execution mechanism: it does not simply generate text, but works through a request until it reaches a stable result.

Orchestrated execution mode

Once the direct execution mode is clear, the second mode becomes easier to describe. The outer harness does not change. The same session layer still preserves continuity across turns, and the same turn loop still governs execution for each request. What changes is the role of the top-level agent inside that loop.

But in this mode, the top-level agent is no longer primarily the direct operator. It becomes a coordinator. Instead of carrying out most actions itself, it decides what work should be delegated, launches worker agents, receives their results, and determines what should happen next.

This changes the meaning of execution inside the loop. In the direct mode, the top-level agent reasons and acts within the same working context. In the orchestrated mode, the top-level agent remains inside the same live loop, but execution is delegated to workers. The top-level loop therefore becomes a coordination loop rather than a direct action loop. That is very close to the planner–executor split described in the article on T1-K : one layer retains direction and synthesis, while another layer carries the concrete operational burden.

At a high level, the coordinator is responsible for four things.

Interprets the user’s goal,
Decides whether the request should be answered directly or delegated,
Assigns work to one or more workers, and then
Synthesizes the returned results into the next decision. That next decision may be to continue the same worker, launch a fresh worker, verify the result, revise the plan, or answer the user immediately.

The coordinator prompt used in this mode explicitly frames the top-level agent as responsible for decomposition, synthesis, and user-facing communication, while workers perform the underlying research, implementation, and verification work.

The workers are the concrete execution surface. They inspect files, run commands, edit code, test changes, and gather evidence. In other words, they do the low-level work that the coordinator no longer performs directly. This again matches an important point from the article: execution becomes a separate layer with its own local context, rather than something constantly mixed back into the planner’s working state. Claude Code reaches a similar outcome by giving the coordinator a restricted tool surface and giving workers the concrete engineering tools.

This division matters because it separates high-level control from low-level execution while keeping the control loop close to the model. The coordinator can stay focused on planning, routing, and synthesis, while workers absorb the noise of shell activity, file inspection, intermediate outputs, and implementation details. That makes the top-level reasoning context cleaner and better suited to multi-stage work. In that sense, Claude Code is close to the architecture from the article not because it has the exact same roles, but because it solves the same problem in a similar direction: protect the planning context from execution noise.

The loop therefore changes shape. In direct mode, it is best understood as a ReAct-style cycle: produce, act, observe, update, and continue. In orchestrated mode, the same loop becomes: interpret, delegate, observe returned results, assess what they imply, and decide what should happen next. The runtime is still iterative, but the top-level agent is no longer the sole executor. This is very close to the article’s idea that the loop should remain simple while the model retains control over what happens next.

This also changes how parallelism is used. In the direct mode, the top-level agent works through the task inside one active context. In the orchestrated mode, independent work can be fanned out across multiple workers. Research can proceed along several lines at once. Implementation can be separated from verification. Follow-up work can be assigned either to the same worker, when its context is still useful, or to a new worker, when a cleaner context is more valuable. That is again very close to the architecture described in the article, where separation of roles is used to reduce context pollution and improve stability across multi-step tasks.

There is, however, an important difference. Claude Code’s orchestrated mode does not appear to introduce a fully separate assessor as a first-class runtime role. In the article’s architecture, planning, execution, and assessment are separated more cleanly. In Claude Code, much of that assessment function appears to remain inside the coordinator itself, supplemented by dedicated verification workers when needed. So the resemblance is strong, but it is not exact: Claude Code looks close to a planner–executor design, but less explicitly separated than the planner–executor–assessor loop described earlier.

The key point is that this mode does not introduce a second independent runtime. The same session controller still manages persistence, framing, recovery, and continuity. The same turn loop still supervises execution for one user request. What changes is that the top-level agent no longer acts mainly through direct tool use. It acts by coordinating other agents inside the same harness.

Seen this way, the difference between the two modes is straightforward. In direct execution mode, the top-level agent remains both reasoner and operator. In orchestrated execution mode, the top-level agent remains the reasoner, but hands most concrete execution outward to workers and turns the inner loop into a coordination process. And this is precisely why the mode feels close to the architecture from the article: not because it duplicates it perfectly, but because it moves in the same direction—toward cleaner planning, separated execution, and a control loop that stays close to the model.