Skip to main content

Writing · June 20, 2026

A Drill-Down on Coding Harnesses

How agentic coding tools are actually built: the parts, the loop, the seams, and how you extend them. A from-the-source breakdown built around two real harnesses, pi and OpenCode.

Reading guide

  • New to this? Read 1 → 3 → 9 (the journey). You'll understand the whole shape.
  • Building an extension? Read 6 → 8 → 11 → 12.
  • Comparing tools? Read 2 → 7 → 10.

Every diagram comes with a "how to read it" note. If a box is unclear, the prose under the diagram explains it.


1. What a coding harness is (and isn't)

An LLM by itself is just a text-in/text-out function. You can paste code into a chat window and it will suggest a fix, but it can't open your files, run your tests, or remember what it did ten minutes ago. To turn that raw intelligence into something that can actually fix a bug across twelve files, you need scaffolding around it.

That scaffolding is the coding harness.

A coding harness is the fixed structure that wraps a language model so it can act inside a development environment. Concretely, every coding harness does five jobs:

  1. Conversation: holds a session: your messages, the model's replies, tool calls and results.
  2. Tools: gives the model capabilities: read/write files, run shell commands, search code, fetch URLs.
  3. A loop: drives the cycle: model thinks → calls a tool → tool runs → result goes back → model thinks again, until the model is done.
  4. Context management: decides what the model sees on each turn (system prompt, history, truncation, compaction).
  5. Surfaces: a way for you to steer it (CLI/TUI/IDE) and a way for code to extend it (hooks/plugins/tools).

The model is the agent, the thing with intelligence. The harness is the structure that makes that intelligence operative. They are different layers; conflating them is the first mistake people make.

The name, and the best analogy

The word comes from test harness (the runner, setup/teardown, assertions, reporter that wraps your test functions) and ultimately from the horse harness (the gear that lets you steer and direct an animal's power). The dev analogy lands best:

Your test harness (describe/it + runner + reporter) doesn't contain your test logic, it provides the structure your test logic runs inside. A coding harness is the same idea one level up: a fixed structure (loop, tools, lifecycle, hooks) around a variable payload (whatever model you plug in). Swap the model, keep the harness.

What is not a harness

ThingWhy it's not a harness
A chatbot UI (ChatGPT, Claude.ai)No tools, no loop, no environment access. Text in, text out.
A raw API call to an LLMNo loop, no memory, no tools. Just a function.
An IDE (VS Code, JetBrains)It's an editor. It can host a harness (e.g. an agent extension), but the editor is not itself the harness.
An LLM with a "tools" parameterThat's an ingredient. You still need the loop, the registry, the context manager.
A single prompt templateThat's content, not structure.

The useful diagnostic: can it change files on disk and run commands in a loop until a task is done, without you babysitting each step? If yes, it's a harness.


2. The universal anatomy

Despite surface differences, almost every coding harness has the same skeleton. Here it is, with the parts every implementation has in some form.

Diagram 1, Universal anatomy of a coding harness

                  ┌──────────────────────────────┐
                  │      User surface            │  CLI · TUI · IDE panel · SDK · HTTP API
                  └───────────────┬──────────────┘
                                  │  prompt + files + images
                  ┌───────────────▼──────────────┐
                  │      Orchestrator            │  session, the turn loop, branching,
                  │      (the agent loop)        │  compaction, stop conditions
                  └─┬─────────────┬────────────┬─┘
            ┌───────┘             │            └────────┐
            ▼                     ▼                     ▼
   ┌────────────────┐   ┌──────────────────┐   ┌────────────────────┐
   │  Model adapter │   │  Tool layer      │   │  Context manager   │
   │  (provider)    │   │  (capabilities)  │   │  system prompt,    │
   │                │   │                  │   │  history, truncat- │
   │ Anthropic /    │   │  registry of     │   │  ion, compaction   │
   │ OpenAI / Google│   │  available tools │   │                    │
   │ / local / ...  │   │                  │   │                    │
   └───────┬────────┘   └────────┬─────────┘   └────────────────────┘
           │                     │
           │            ┌────────┴─────────┬───────────────┐
           │            ▼                  ▼               ▼
           │        built-ins         extension        MCP servers
           │       (read, write,      tools            (out-of-process,
           │        edit, bash, …)    (plugin/         JSON-RPC)
           │                          extension)
           │            │                  │               │
           └────────────┴──────────────────┴───────────────┘
                                  │
                                  ▼
                       ┌──────────────────────┐
                       │   Environment        │  filesystem, shell,
                       │   (the real world)   │  git, network, processes
                       └──────────────────────┘

How to read it: follow the prompt downward. The user surface receives your input and forwards it to the orchestrator, which is the brain. The orchestrator talks to three peer subsystems, the model adapter (how it calls the LLM), the tool layer (what the LLM is allowed to do), and the context manager (what the LLM gets to see). Tools ultimately reach the environment (your actual disk, shell, git, network). The bottom row is reality; everything above it is software.

The three peer subsystems are the crux. A harness's character is mostly determined by how it answers three questions:

  • Model adapter: which providers? How easy to add one? (See §7.)
  • Tool layer: built-in tools or thin core? MCP-native or not? (See §4, §8.)
  • Context manager: how aggressively does it trim? Can you customize compaction? (See §5.)

Different harnesses make different bets on these three, and that's where the interesting comparisons live (§10).


3. The agent loop, the heart

If you strip everything else away, the harness exists to run this loop. Understanding it cold is 80% of understanding every harness.

Diagram 2, The turn loop

        ┌────────────────────────────────────────────────────┐
        │  A user message is ready to process                │
        └────────────────────────┬───────────────────────────┘
                                 ▼
   ┌──────────────────────────────────────────────────────────┐
   │  1. ASSEMBLE CONTEXT                                    │
   │     system prompt + conversation history + attached     │  ◀── hooks can
   │     files                                                │      rewrite these
   │  2. BUILD REQUEST                                       │  ◀── hooks can set
   │     temperature, max tokens, headers, provider opts     │      params/headers
   └────────────────────────┬─────────────────────────────────┘
                            ▼
                   ┌──────────────────┐
                   │   3. CALL THE LLM │══════════► stream of tokens
                   └─────────┬────────┘
                             │  response = text  and/or  tool calls
                             ▼
          ┌────────────────────────────────────────────┐
          │  4. Did the model request any tools?       │
          └──────┬──────────────────────────┬──────────┘
             YES │                    NO    │
                 ▼                         │
   ┌─────────────────────────────────────┐ │
   │  5. RESOLVE each tool in the        │ │  ◀── hook: permission
   │     registry                        │ │      (allow/deny)
   │  6. GATE & MUTATE its arguments     │ │  ◀── hook: tool.before
   │  7. EXECUTE it                      │ │      (built-in / extension / MCP)
   │  8. MUTATE the result               │ │  ◀── hook: tool.after
   └──────────────────┬──────────────────┘ │
                      │ tool result(s)     │
                      ▼                    │
          append results to context ───────┤
                      │                    │
              ┌───────▼────────┐           │
              │ loop back to 1 │           │
              └────────────────┘           │
                                        ───┘
                                          ▼
                          ┌──────────────────────────────┐
                          │ 9. FINAL ANSWER → user       │
                          │    (no more tool calls)      │
                          └──────────────────────────────┘

How to read it: this is one turn expanded. A turn is "ask the model once, do whatever it asks." If the model asks for tools (left branch), you run them, feed results back, and loop, same model, richer context. When the model finally replies with only text (right branch), the turn ends and you show the answer.

The two truths this diagram encodes:

  1. The model is not in control of the loop, the harness is. The model only suggests tool calls; the harness decides whether to run them, in what order, with what arguments, and whether to send results back. The "agentic" feeling emerges from the harness faithfully running the loop, not from the model steering itself.
  2. Every numbered step is a potential seam. Steps 1, 2, 5/6, 8 are where extensions plug in. We'll cover that in §6.

Stop conditions: the loop ends when (a) the model returns no tool calls, (b) a tool or hook signals termination, (c) a token/turn budget is hit, or (d) the user aborts. Good harnesses make these explicit.

Parallel tool calls: modern harnesses let the model request several tools at once in a single turn. The harness then preflights them (checking permissions, mutating args) and executes them concurrently, while keeping results ordered. This is why file-mutation safety (see the shaded box in §4) matters.


4. The tool layer

Tools are how the model touches reality. The design of the tool layer is the single biggest differentiator between harnesses, it determines whether a tool "exists" at all and how new ones appear.

Diagram 3, Tool resolution

                 Model: "call tool  X(args)"
                          │
                          ▼
                ┌──────────────────────┐
                │   Tool registry      │   a flat namespace:
                │                      │   tool-name  ──►  definition
                └──────────┬───────────┘   (description, schema, execute fn)
                           │ lookup "X"
        ┌──────────────────┼──────────────────────┬──────────────────┐
        ▼                  ▼                      ▼                  ▼
   ┌─────────┐      ┌──────────────┐      ┌──────────────┐   ┌──────────────┐
   │BUILT-IN │      │  EXTENSION   │      │     MCP      │   │   PROMPTED   │
   │ in-core │      │  tool        │      │   server     │   │   skill      │
   │         │      │ (plugin /    │      │ (separate    │   │ (no code;    │
   │ read    │      │  extension   │      │  process,    │   │  instructs   │
   │ write   │      │  registers)  │      │  JSON-RPC    │   │  the model)  │
   │ edit    │      │              │      │  over stdio/ │   │              │
   │ bash    │      │              │      │  HTTP/SSE)   │   │              │
   └────┬────┘      └──────┬───────┘      └──────┬───────┘   └──────┬───────┘
        │                  │                     │                  │
        ▼                  ▼                     ▼                  ▼
   run locally        run via hook          marshal args,          model uses the
   (fs / shell /      (same process         call server,           existing tools
   git / …)           or over a client)     get result back        per instructions
        │                  │                     │
        └──────────────────┴─────────────────────┘
                           │
                           ▼
                      tool result
                  (fed back to the model)

How to read it: when the model says "call X," the harness looks X up in one flat registry: the model doesn't know or care where X came from. The registry silently routes X to one of four possible sources. The rightmost one (prompted skill) isn't really a tool at all, it's just instructions that change how the model uses the other three.

The four sources, in plain English:

  1. Built-in tools: shipped with the harness core. Always available, run in-process, fastest. Examples: read, write, edit, bash. Harnesses differ wildly on how many of these ship (see §8).
  2. Extension tools: registered by a plugin/extension via code. Same power as built-ins if the extension runs in-process; slightly more constrained if it runs out-of-process. This is what a spreadsheets extension is.
  3. MCP servers: the Model Context Protocol is a standard (JSON-RPC) for separate programs to expose tools to agents. You run an MCP server (a database connector, a Jira client, anything), point your harness at it, and its tools appear in the registry. The big win: language-agnostic, isolated, shareable. The big cost: a round-trip per call and no access to the harness's internal loop.
  4. Prompted skills: not code at all. A skill is a Markdown document that instructs the model ("when the user asks to deploy, follow these steps using bash"). It extends behavior by changing the prompt, not by adding tools. Both major harnesses support the open Agent Skills standard for these.

Shaded note, file-mutation safety. When the model calls several tools in one turn and two of them edit the same file (say, ythe extension and the built-in edit), naive execution loses data: both read the original, both write, last write wins. Robust harnesses serialize mutations to the same path through a queue so each edit sees the previous one. When you write an extension that mutates files, route it through that queue, don't roll your own read-modify-write.


5. Context management

The model has a finite context window. A long session will eventually exceed it. Context management is how the harness decides what fits.

Four techniques, in roughly the order they kick in:

  1. Tool-output truncation. A bash command that dumps 50,000 lines gets truncated to the last N. Each tool declares its own truncation strategy.
  2. System prompt budgeting. The system prompt, tool definitions, and loaded context files (AGENTS.md, skills) all count. Harnesses prune or summarize these.
  3. History pruning. Old, low-relevance turns get dropped, keeping recent ones verbatim.
  4. Compaction. When the window is nearly full, the harness summarizes the older conversation into a short recap and replaces the raw history with it. It's lossy but powerful, and most harnesses let extensions customize how the summary is produced (a hook).

The pattern to remember: compaction is where long-running agent work lives or dies. A harness whose compaction you can't inspect or customize will silently forget the thing you cared about. Look for a /tree-style view that keeps the full raw history on disk so you can always go back.


6. Mutation surfaces, hooks

The loop in §3 has numbered steps. Almost every step is a place the harness will call out to your code before continuing. Those call-outs are hooks (a.k.a. events, lifecycle callbacks). Hooks are the single most powerful extension mechanism, they let you reshape the harness's behavior without forking it.

Diagram 4, Where hooks fire along the turn loop

   ┌────────────────────────────────────────────────────────────────────┐
   │  BEFORE A TURN                                                      │
   │   • input transform        (rewrite/redirect user's prompt)         │
   │   • before_agent_start     (inject a message, change system prompt) │
   └────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
   ┌────────────────────────────────────────────────────────────────────┐
   │  WHILE BUILDING THE LLM REQUEST                                     │
   │   • context / messages.transform (rewrite conversation history)     │
   │   • system prompt transform   (rewrite the system prompt)           │
   │   • params / headers          (temperature, max tokens, HTTP hdrs)  │
   │   • before_provider_request   (inspect/replace the raw payload)     │
   └────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
                          CALL THE MODEL
                                  │
                                  ▼
   ┌────────────────────────────────────────────────────────────────────┐
   │  AROUND EACH TOOL CALL                                              │
   │   • tool_call (before)   →  permission gate (allow/deny)            │
   │                          →  mutate arguments in place               │
   │   • tool_result (after)  →  mutate content / details / error flag   │
   └────────────────────────────────────────────────────────────────────┘
                                  │
                                  ▼
                          LOOP OR FINISH
                                  │
                                  ▼
   ┌────────────────────────────────────────────────────────────────────┐
   │  SESSION-LIFETIME                                                   │
   │   • session_start / shutdown   (init & cleanup resources)           │
   │   • compaction hooks            (customize how history is summarized)│
   │   • model_select               (react to model switches)            │
   │   • event bus                  (subscribe to *everything*)          │
   └────────────────────────────────────────────────────────────────────┘

How to read it: the four horizontal bands map to the four phases of the loop (§3). Each bullet is a distinct hook. A hook is just a function you register that the harness calls at that moment. You can read state and either return nothing (do nothing) or return a patch that changes what happens next.

The key insight: hooks turn the harness from a closed program into a programmable interpreter. With the right hooks you can build:

  • Permission systems: block rm -rf or writes to .env at the tool_call gate.
  • Auto-checkpointing: git stash at the start of every turn via before_agent_start.
  • Custom compaction: summarize history your way at the compaction hook.
  • Observability: log every tool call and cost via the event bus.
  • Plan mode, sub-agents, todos: features some harnesses ship and others leave to you (§8).

The tradeoff hooks encode: the more hooks a harness exposes, the more you can reshape it and the more you must trust the code doing the reshaping (hooks run with your full permissions). Hook-rich harnesses need a project trust flow: "is this folder allowed to load extensions from?" That's not bureaucracy, it's the direct consequence of giving extensions deep access.


7. Two hosting architectures

There's a fundamental axis every harness picks: does the extension run inside the agent loop, or in a separate process that talks to it? This choice cascades into everything else.

Diagram 5, In-process vs. server+client

   A)  IN-PROCESS  (example: pi)              B)  SERVER + CLIENT  (example: OpenCode)

   ┌───────────────────────────────┐          ┌──────────┐         ┌──────────────────────────┐
   │        one process            │          │  client  │  HTTP   │     server process       │
   │ ┌───────────────────────────┐ │          │ (TUI /   │ ───────►│  (the harness itself)    │
   │ │  agent loop               │ │          │  CLI /   │ ◄───────│ ┌──────────────────────┐ │
   │ │  + tools (built-in & ext) │ │          │  SDK /   │  JSON   │ │   agent loop         │ │
   │ │  + extension hooks ───────┼─┼── deep,  │  IDE)    │         │ │   + tools            │ │
   │ │  + UI            (in-proc,│ │  direct  │          │         │ │   + plugin loader    │ │
   │ └───────────────────────────┘ │  access) └──────────┘         │ └──────────┬───────────┘ │
   └───────────────────────────────┘                               └────────────┼─────────────┘
                                                                               │
                                                                  plugins & MCP servers
                                                                  are ALSO clients
                                                                               │
                                                                  ┌────────────▼───────────┐
                                                                  │  separate processes    │
                                                                  │  (MCP, plugins, …)     │
                                                                  └────────────────────────┘

How to read it: In (A), everything, loop, tools, and the extension's hooks, lives in one process. Ythe extension can read in-memory session state and call harness APIs directly. In (B), the harness is a server; the UI is one client among several, and your plugins/MCP servers are also clients, talking back over HTTP/JSON-RPC.

What each model optimizes for:

In-process (A)Server+client (B)
Extension powerVery high, direct access to the loop, all hooks, can replace built-insHigh but mediated, reach the server through a client API
Trust required of extensionsHigh (they share your process + permissions)Lower for MCP (isolated procs); still high for in-proc plugins
Latency per tool callOne memory callA network round-trip
Front-end flexibilityUsually one built-in UIMany front-ends can drive the same session (TUI, IDE, web)
Headless/remote useAwkward (need tmux, etc.)Natural (the server is the API)
Concurrency modelWatch out for shared mutable stateSerialization through the server

Neither is universally better. (A) is best for "reshape the agent itself." (B) is best for "many surfaces drive one agent, some of them untrusted or remote." Real products sometimes blend the two, e.g., in-process plugins plus out-of-process MCP.


8. The composition spectrum

Beyond hosting, harnesses differ on a second axis: how much ships in the core, and how you add to it. This is the "thick vs. thin" decision, and it's a deliberate philosophy, not an oversight.

Diagram 6, Two design axes

   AXIS 1  ──  HOW THICK IS THE CORE?
   thick core / built-in features                  thin core / build-it-yourself
   ◄─────────────────────────────────────────────────────────────────────►
        ships: plan mode, sub-agents, todos,           ships: read/write/edit/bash +
        MCP-native, permissions, LSP, web, …           grep/find/ls. Everything else
        you EXTEND by ENABLE/CONFIG                    you EXTEND by CODE
             │                                              │
         (thick)                                          (thin)

   AXIS 2  ──  HOW ISOLATED ARE EXTENSIONS?
   out-of-process / isolated                        in-process / co-equal
   ◄─────────────────────────────────────────────────────────────────────►
        MCP servers over JSON-RPC;                     plugins/extensions with rich
        safe, language-agnostic, can't reach           hooks; powerful, can reshape the
        the loop; round-trip per call                  loop; needs trust + permissions
             │                                              │
         (isolated)                                      (co-equal)

How to read it: these are two independent choices. A harness can be thick-core + isolated-extension, or thin-core + co-equal-extension, or any mix. Where a harness sits on both axes tells you its whole personality.

The honest tradeoff:

  • Thick core = faster start, less to build, but you live with the maintainers' opinions about how plan mode / sub-agents / permissions should work.
  • Thin core = more work upfront, but every opinionated feature is yours to build exactly as you want, and to remove.
  • Isolated extensions (MCP) = safer, shareable across tools, but can't reach inside the loop and pay a round-trip tax.
  • Co-equal extensions = maximum power and minimum friction, but require a trust model because they run with your credentials.

A good harness lets you combine them: a thin core that's friendly to both in-process extensions and MCP servers, so you pick the right tool per job.


9. One prompt's full journey

To make this concrete, here is the entire path of a single prompt, "summarize report.xlsx", through a harness that has a spreadsheets extension installed. Every label maps back to the earlier diagrams.

Diagram 7, End-to-end journey of one prompt

   [1] USER types "summarize report.xlsx" in the TUI
                    │
                    │ (Diagram 1: user surface → orchestrator)
                    ▼
   [2] ORCHESTRATOR receives message, starts a TURN  (Diagram 2: top)
                    │
                    ▼
   [3] CONTEXT MANAGER assembles what the model will see:
         system prompt
         + conversation history
         + tool catalog (lists "sheet_read" exists, from the extension)
         + any AGENTS.md / skills                          (Diagram 1: context mgr)
                    │
                    │  hooks here (§6) can rewrite any of it — none fire
                    ▼
   [4] MODEL ADAPTER builds the provider request         (Diagram 1: model adapter)
         params, headers, provider payload
                    │
                    │  hooks here (§6) can set temp/headers — none fire
                    ▼
   [5] CALL LLM (Anthropic/OpenAI/…) ──── stream back ───►
                    │
                    │  response: "I'll call sheet_read({path:'report.xlsx'})"
                    ▼
   [6] TOOL LAYER resolves "sheet_read":                 (Diagram 3)
         not built-in, not MCP → it's an EXTENSION tool.
                    │
                    ▼
   [7] GATE & MUTATE  (§6 hooks) — permission allows; no arg mutation
                    │
                    ▼
   [8] EXECUTE the extension: parses xlsx, returns rows  (the extension's code)
                    │
                    │  result-mutation hook (§6) could redact/trim — none fire
                    ▼
   [9] RESULT appended to context  ──►  back to step [3]  (Diagram 2: loop)
                    │
                    ▼
  [10] LLM writes the summary, no tool call  →  loop ENDS
                    │
                    ▼
  [11] ANSWER rendered back to the TUI  (Diagram 1: surface)

How to read it: the prompt passes through every subsystem, surface, orchestrator, context manager, model adapter, tool layer, environment, and the loop happens once with a tool call before finishing. Notice how little of this is the extension: steps 1-7 and 9-11 are pure harness. The extension's code is only step 8. That ratio, 10% your code, 90% harness, is the whole point of building on a harness instead of from scratch.


10. Case studies, two real harnesses

This is the part I can ground. The two harnesses below were verified directly from their source and docs. Everything in earlier sections generalizes from these two.

pi, thin core, in-process, hook-rich

  • Hosting (§7): in-process (A). One program; extensions are TypeScript modules loaded via a JIT compiler with direct access to a rich ExtensionAPI.
  • Core thickness (§8): deliberately thin. Ships only read, write, edit, bash, grep, find, ls. Explicitly refuses to build plan mode, sub-agents, todos, background bash, permission prompts, and built-in MCP, each as a stated philosophical choice ("build it with extensions").
  • Extension model: extensions hook any lifecycle event (before_agent_start, tool_call, tool_result, context, before_provider_request, session_before_compact, …), register tools/commands/shortcuts/providers, replace built-in tools, rewrite the system prompt per turn, render custom UI. Very high power.
  • MCP: not built in. You either write a CLI tool with a README (a skill) or build an extension that adds MCP support.
  • Trust model: because extensions are co-equal and run with full permissions, pi has a project trust flow, it asks before loading extensions from an untrusted project folder.
  • Composability: skills follow the open Agent Skills standard; packages can be shared via npm or git.
  • Vibe: "a minimal terminal coding harness … adapt pi to your workflows, not the other way around."

OpenCode, thick core, server+client, MCP-native

  • Hosting (§7): server+client (B). A long-lived server process holds the agent loop; the TUI/CLI/SDK/desktop/web are clients over HTTP. Plugins are also clients, talking back through an SDK client.
  • Core thickness (§8): deliberately thick. Ships plan mode (plan), sub-agents (task), todos (todo), a user-question tool, webfetch/websearch, LSP integration (lsp), apply_patch, plus the usual file/shell/search tools, all built-in.
  • Extension model: plugins are functions returning a Hooks object (tool, permission.ask, chat.params, chat.headers, experimental.chat.system.transform, tool.execute.before/after, experimental.session.compacting, a global event hook, …). Hook categories map nearly 1:1 to pi's; the mechanism differs (return-Callbacks vs. event subscription; over-the-wire vs. in-process).
  • MCP: first-class. Connect MCP servers via config and their tools appear in the registry.
  • Config-driven extension: many capabilities are turned on through opencode.json (agents, MCP, permissions, models) rather than coded.
  • Vibe: batteries-included; configure and connect rather than build from scratch.

Side-by-side

DimensionpiOpenCode
Core tools shipped7 (file/shell/search only)many (+plan, task, todo, lsp, web, …)
Plan mode / sub-agents / todosrefused, build via extensionbuilt-in
Extension hostingin-process, co-equalserver-side plugins + out-of-proc MCP
MCPnot built in (use a skill or extension)first-class
Hook richnessvery high (any lifecycle event)very high (similar categories)
Composition stylecode-first (registerTool, events)mix of config + plugin code + MCP
Trust model neededyes (co-equal extensions)yes for plugins; lighter for MCP
Multi-surface (IDE/web driving one session)awkwardnatural (server is the API)

The takeaway: these aren't points on a "better/worse" line; they're two defensible answers to the same question, "should the common features be shipped or built?" pi bets that you'd rather build exactly your workflow; OpenCode bets that you'd rather configure a thick base. The underlying shape (loop, tools, hooks, context) is essentially the same.

Confidence note: pi and OpenCode details above were verified directly from their docs and source. Other well-known harnesses (Claude Code, Cursor's agent mode, Aider, Devin, Codex, etc.) share the same anatomy but I have not read their internals here, treat any specific claim about them as conceptual, not verified.


11. Harness vs. garden, how to extend well

Once you're inside a harness, every decision is gardening: you're planting through someone else's beds. Split your thinking into two layers and you'll avoid the two classic failure modes.

Diagram 8, Two-layer mental model for extending a harness

   ┌────────────────────────────────────────────────────────────────────┐
   │  HARNESS LAYER  (steel — not yours)                                 │
   │   the loop, the lifecycle, the tool-result contract, the hook       │
   │   signatures, the file-mutation queue, the trust model.             │
   │   These are PUBLISHED CONTRACTS owned by the harness maintainers.   │
   │                                                                     │
   │   your job: RESPECT them. Don't reach into internals, don't        │
   │   depend on unexported symbols, don't fork the core.                │
   └────────────────────────────────────────────────────────────────────┘
                                  ▲
                                  │ plant through the contracts
                                  │
   ┌────────────────────────────────────────────────────────────────────┐
   │  GARDEN LAYER  (soil — yours)                                       │
   │   your tools, skills, prompt templates, packages, hook handlers,    │
   │   the names you choose, the prompt snippets you write, whether      │
   │   you publish it.                                                   │
   │                                                                     │
   │   your job: DESIGN them — for composition, discoverability,         │
   │   survival across updates, and playing nicely with neighbours.      │
   └────────────────────────────────────────────────────────────────────┘

The two failure modes:

  • Only-garden-think → rot. You depend on an internal symbol, skip the mutation queue, collide with another extension's tool name, and it all breaks on the next harness update. You treated steel like soil.
  • Only-harness-think → over-engineering. You harden an exploratory script like it's infrastructure, never publish because "it's not done," and build a garage for a window box.

The discipline: for every choice ask is this a contract decision or a design decision? Contracts (use the mutation queue, throw to signal errors, survive reloads) → respect them, don't redesign them. Design (name your tool sheet_read not sr, add a prompt guideline so the model prefers it over cat) → make deliberate choices. And don't call it "building a harness", you're gardening on one.


12. Decision guide

Choosing a harness

  • Want plan mode / sub-agents / permissions out of the box? Lean thick-core (OpenCode-style).
  • Want to reshape the agent itself, or match a bespoke workflow? Lean thin-core, hook-rich (pi-style).
  • Need many front-ends (TUI + IDE + web) driving one session, or remote/headless use? Lean server+client.
  • Already invested in MCP servers? Pick something MCP-native.
  • Care about running untrusted extensions safely? Prefer isolated/MCP-style extension hosting; insist on a project-trust flow for anything in-process.

Extending a harness (how to pick the mechanism)

You want to…Use
Expose a capability (read a spreadsheet, query a DB) the model lacksan extension tool (in-process) or an MCP server (isolated)
Change behavior (gate dangerous commands, auto-commit, custom compaction)a hook (lifecycle event)
Give the model instructions for an existing capabilitya skill (Markdown; no code)
Add a reusable prompta prompt template
Add a model providera provider-registration hook or config
Share any of the abovebundle as a package (npm/git)

Rule of thumb: capability → tool; behavior → hook; knowledge → skill/sytem prompt. Mixing these up is the most common beginner mistake (e.g., writing a tool when a skill would do, or a hook when you needed a tool).

Writing a tool that survives

  1. Respect the tool-result contract (content for the model, details for rendering/state, throw to signal errors).
  2. Route file mutations through the harness's mutation queue, don't hand-roll read-modify-write.
  3. Make parameters strict; use a prepareArguments shim only for backward compat with old stored calls.
  4. Add a promptSnippet so the model knows the tool exists, and promptGuidelines so it knows when to prefer it over alternatives.
  5. Be a good neighbour: names that won't collide, errors that are informative, results that are sized for a context window.

Glossary

  • Harness: the scaffolding (loop, tools, context, surfaces) that turns an LLM into an acting agent. The fixed structure around a variable payload.
  • Agent: the LLM running inside the harness. The intelligent part; not the harness.
  • Turn: one ask-the-model cycle: build context → call model → (run tools if any) → loop or finish.
  • Tool: a capability the model can invoke; resolved through a registry from one of four sources (built-in / extension / MCP / prompted-skill).
  • Hook / event: a lifecycle callback the harness invokes at a defined point, letting your code reshape behavior.
  • Context management: deciding what the model sees each turn; includes truncation, pruning, and compaction.
  • Compaction: lossily summarizing old conversation to fit the context window; usually customizable via a hook.
  • MCP (Model Context Protocol): a JSON-RPC standard for separate programs to expose tools/data to agents. Isolated, language-agnostic, shareable.
  • Skill: a Markdown document (Agent Skills standard) that instructs the model, changing behavior without adding code.
  • Extension / plugin: code that adds tools and/or hooks to a harness.
  • Project trust: a security flow that asks whether a folder is allowed to load extensions, required by harnesses with co-equal (full-permission) extensions.
  • In-process vs. server+client: the hosting axis: extensions live inside the loop, or are clients talking to a server.
  • Thick vs. thin core: the philosophy axis: ship common features built-in, or leave them to be built.

Sources & confidence

  • pi (@earendil-works/pi-coding-agent): all claims verified against its shipped docs/ and examples/ and the live extension/API surface, read in full directly.
  • OpenCode (sst/opencode, dev branch): anatomy and extension model verified against packages/plugin/src/index.ts (the Hooks and PluginInput types), packages/opencode/src/tool/* (built-in tool set), and the top-level monorepo layout (packages/{tui,cli,sdk,server,desktop,web,plugin,mcp,…}).
  • Other harnesses (Claude Code, Cursor, Aider, Devin, Codex, etc.): mentioned only at the conceptual level of the universal anatomy (§2). No internals read; do not treat specifics as verified.
  • MCP, Agent Skills standard: referenced as widely-known open standards.

If you want any section deeper, e.g., a full walk-through of the context-manager math, a worked compaction example, or a verified comparison with a third harness, say the word and I'll go read that source and extend this document rather than guess.