CodingAgents 07 — Subagents & Synthesis

00

Topics We’ll Cover

Why Delegate at All?
Spawn vs Bind — The Distinction That Matters
What Context Crosses the Boundary
Sandboxing & Recursion Depth
Claude Code vs Codex CLI — Two Subagent Models
A Pattern Catalogue You Can Steal
Interactive: The Subagent Tree
Synthesis — All Six Components Together
Reading List & Where to Go Next

01

Why Delegate at All?

Most coding tasks fit in a single agent loop. The interesting ones don’t. Three failure modes push designers towards subagents:

Context overflow

The task is genuinely big — touch 30 files, read 50 test failures, follow a multi-hop dependency chain. Even with aggressive compaction (deck 06) the parent agent runs out of room. Delegation hands off pieces with their own context budget.

Side questions

Mid-task, the agent needs to know “what configures the test database?”. Reading config files takes 6 turns and 4 000 tokens, after which the parent is back where it started but with a polluted context. A subagent answers in one summary line.

Parallelisable work

Three files need the same kind of edit; tests need to run while a build runs. A parent + N subagents run in parallel, each on a fast independent chain, then the parent integrates results.

Raschka’s framing

The article phrases the goal as: “parallelize certain work into subtasks via subagents and speed up the main task”. The two words doing the most work in that sentence are “certain” and “bounded”. Subagents are not a generic decomposition tool; they are a targeted lever for specific patterns the harness recognises.

02

Spawn vs Bind — The Distinction That Matters

The article makes a sharp point: “not just how to spawn a subagent but also how to bind one”. Spawning is the easy part: instantiate a child agent with a fresh context. Binding is the hard part: decide what context, tools, approval policy, sandbox, and budget the child gets — and what flows back.

src/handlers/*.py approval inherits from parent depth = 1, budget = 12 turns bind summary back

What “binding” encompasses

Inputs — what context (workspace summary, working memory, specific files) is copied into the child.
Tool surface — which tools the child may call. Often a strict subset of the parent’s.
Approval policy — usually inherited; sometimes tightened for autonomous subagents.
Sandbox — the child may share the parent’s sandbox or get a fresh one.
Budget — turn limit, token limit, wall-clock limit.
Outputs — what comes back (one summary string? structured result? files written?).
Recursion depth — can this child spawn its own children? How many levels deep?

Why “bind” matters more than “spawn”

A poorly-bound subagent is worse than no subagent. It either re-does work the parent already did (because too little context flowed in), or it explodes the system’s effective context (because too much flowed back), or it inherits write tools it didn’t need and corrupts state. The “spawn” API is one line of code. The binding policy is the design.

03

What Context Crosses the Boundary

Two questions to answer for any subagent type: what flows in? and what flows out?

Inbound — what the child sees

Context layer	Typical inheritance	Why
Workspace summary	Always inherited	The child needs the same project context the parent used.
Tool definitions	Subset, not all	Read-only subagents shouldn’t have `write_file` in their schema at all.
Working memory	Snapshot at spawn time	The child uses it; doesn’t modify the parent’s.
Compact transcript	Usually not inherited	The child gets a focused task spec instead. Parent transcript is noise.
Task spec	Always	The single most important input: a one- to three-paragraph brief on what this child is for.
Approval policy	Inherited or tightened	Never relaxed below the parent’s; can be stricter (e.g., read-only).

Outbound — what comes back

The summary contract

By far the most common return shape: a single string of 100–500 tokens. The child does many turns of work; the parent sees the result, not the work. This compresses rather than expands the parent’s context — the entire point of delegation.

Structured artefacts

Some subagents return more: a list of file paths, a JSON structure, a diff. Always schema this so the parent isn’t parsing free text. If the child wrote files, the side-effect is the result; the return value is just “done, here’s a summary”.

A subagent invocation contract

{
  "name": "spawn_subagent",
  "args": {
    "role": "research",                            # selects bind profile
    "task": "Identify which file configures the test database fixture, "
            "and report its key options. Do not modify any files.",
    "important_files": ["tests/conftest.py"],          # inbound hint
    "max_turns": 8,                                  # budget
    "return_schema": {
      "type": "object",
      "properties": {
        "file":    {"type": "string"},
        "options": {"type": "object"},
        "summary": {"type": "string", "maxLength": 500}
      },
      "required": ["file", "summary"]
    }
  }
}

04

Sandboxing & Recursion Depth

Two safety properties harnesses get wrong if they aren’t designed in from the start.

Sandbox inheritance

If the parent runs in a Docker container or Git worktree, the child inherits the same sandbox by default. This is what Codex does explicitly: “subagents inherit sandbox and approval setup”. The child cannot widen the sandbox; the harness wires it that way.

Recursion depth

Set a hard limit. Even depth 2 is unusual; depth 3+ is almost always a sign that the design is wrong (or the model is confused). A simple counter on the spawn tool prevents accidental fork-bombs.

depth=0 — main user-facing agent (turn budget 60, full tool surface)

↓ spawn

depth=1 — e.g. research subagent (turn budget 8, read-only)

↓ spawn

depth=2 — rarely needed; sometimes for nested file-search

↓ blocked

depth=3 — HARD STOP. Spawn tool refuses with error.

Why bound depth aggressively

Cost — each level multiplies token usage by the spawn fan-out.
Latency — deep trees serialise badly even when each branch is fast.
Comprehension — debugging an 8-level deep agent failure is genuinely difficult.
Failure modes compound — a lossy summary from a depth-3 agent reaching the root is often unrecognisable.

Token budgets cascade

If the depth-0 agent has 60 turns total, depth-1 children should each have a small fraction (say 8–12). The parent must afford to spawn several without going over its own budget. Concrete rule: child max-turns ≤ (parent remaining turns) / (expected fan-out + 2). If the parent doesn’t have the budget, it shouldn’t spawn.

05

Claude Code vs Codex CLI — Two Subagent Models

Raschka contrasts the two tools the article centres on. Both expose subagents, but the design choices differ in instructive ways.

Property	Claude Code	Codex CLI
Subagent maturity	Long-standing — the `Agent` / `Task` tool is part of the harness from early on	Added more recently — subagent support layered on top of an established CLI
Default scope	Read-only by default; specialised agent types unlock specific writes	Often not read-only; subagents can write within the sandbox
Inheritance	Inherits workspace context; gets a focused task spec; isolated transcript	Inherits sandbox and approval setup; integration is closer to the parent
Concurrency model	Designed for parallel spawn (multiple agents at once)	Typically serial — one subagent at a time
Configuration surface	Custom agent types via Markdown definitions; declarative	Codex configuration files / flags; closer to programmatic

Claude Code — many small read-only specialists

The pattern this encourages: spawn a research subagent for a side question, an Explore subagent to map an unfamiliar area, a code-reviewer for a second opinion. Each child is cheap, scoped, and parallel-safe. The parent integrates summaries.

Codex CLI — deeper, more autonomous children

The pattern this encourages: hand off a self-contained slice of the task to a child that runs longer and may write within its sandbox. Parent gets the resulting branch state, not just a summary. Closer to a multi-process model than a multi-thread one.

Neither is “right”

The two designs reflect different bets about how teams use coding agents. Claude Code optimises for an interactive collaborator that asks for help mid-task; Codex CLI optimises for handing off bounded chunks of autonomous work. Treat them as different tools for different jobs, not as the same idea expressed differently.

06

A Pattern Catalogue You Can Steal

Five subagent patterns that recur across well-designed harnesses. Each one is a specific bind profile — not a generic “spawn an agent and hope”.

Pattern A — research / side-question

Read-only tools, 5–10 turn budget, returns a 200-token summary. Used when the parent needs a fact (“which file configures X?”) without polluting its own context. Safe, cheap, parallel-friendly.

Pattern B — parallel-edit fan-out

One child per file (or per directory), each with write access scoped to its target. Same task description for all (“rename FooBar to BarFoo”). Parent integrates by checking each returned diff.

Pattern C — second-opinion review

A reviewer subagent receives a diff and produces a structured critique. No write tools. Often invoked after the parent thinks it’s done; the parent revises based on findings.

Pattern D — long-running autonomous slice

Codex-style. The parent hands off a contained task (“migrate this module to async”) to a child with full tool access in a fresh worktree. The child runs to completion; parent sees the resulting branch.

Pattern E — planner / executor split

A reasoning-model planner produces a step-by-step plan; non-reasoning executor children run each step. Cheap fast model executes; expensive thinking model only plans. A tiered architecture.

Pattern F — the anti-pattern

Just spawn another agent and tell it to figure it out. No bind profile, no scoped task, no return contract. This is the failure mode — and unfortunately the easiest mistake to make. Always design the binding before you write the spawn call.

07

Interactive: The Subagent Tree

Each pattern produces a different shape of agent tree. Pick a pattern below to see how the parent hands off, what binding looks like, and how results merge back.

…

What to notice

The tree shape is not a free parameter — it is determined by the kind of work. Research is one-deep, narrow. Parallel-edit is one-deep, wide. Autonomous slice is one-deep but each child is heavyweight. Planner/executor adds a planning hop. When you find yourself drawing a different shape, it usually corresponds to a different pattern.

08

Synthesis — All Six Components Together

One last pass over Raschka’s framework, with a single sentence per component capturing what we’ve learned across the series. Read it as a checklist for any harness you build or evaluate.

COMPONENT 01

Live Repo Context

Build a stable, cheap workspace summary (layout + commands + conventions) so the agent inherits what a colleague would already know.

COMPONENT 02

Prompt Shape & Cache

Sort the prompt by stability and put a cache breakpoint after the last stable layer; pay full price only for what genuinely changed this turn.

COMPONENT 03

Tool Access

Replace prose suggestions with named, schema-checked actions; gate them with the four validation questions and a clear approval UX.

COMPONENT 04

Context Bloat

Clip individual tool outputs, dedupe repeated reads, summarise old turns with recency weighting; treat “what the model sees” as a budget.

COMPONENT 05

Session Memory

Maintain a small structured working memory (task, files, decisions, next) on disk that survives compaction and carries across turns.

COMPONENT 06

Subagents

Delegate side questions and parallel slices to bounded children; the spawn API is one line, the binding policy is the entire design.

A useful test for any harness

Pick the agent tool you use most. Write down what it does for each of the six components. Where it’s strong and where it’s weak should jump out. Now do the same for a harness you’d build — and notice which component you have the least concrete answer for. That is, almost always, where to start.

The deeper claim, restated

Raschka’s essay is, in the end, a quiet reframing of where the engineering is happening in modern LLM applications. The model is the engine, but the engine is also a commodity that improves on its own quarterly schedule. The harness is the durable artefact. The choices made about context, tools, memory, and delegation outlast any specific model version. Building good harnesses is the new applied software engineering of LLM systems — and the article is the clearest map I’ve seen of that territory.

09

Reading List & Where to Go Next

The article itself

Sebastian Raschka, “Components of a Coding Agent”. Read it through once, then again with this series open.
His mini-coding-agent: a from-scratch Python implementation of all six components in <1 000 lines. The single best thing you can do after reading the article.

Production harness sources to study

Aider (github.com/Aider-AI/aider) — Paul Gauthier’s repo-map approach, search-replace edits. A different bet on Component 1.
Goose (github.com/block/goose) — Block’s open agent. Inspectable example of MCP-driven tool use (Component 3).
Continue (github.com/continuedev/continue) — IDE-embedded agent with explicit context-source plugins.
OpenHands (github.com/All-Hands-AI/OpenHands) — sandbox-first design; useful study of Component 6 boundaries.

Adjacent essays and papers

Anthropic, “How we built our multi-agent research system” — subagent design in production.
Anthropic, “Claude Code best practices” — the harness vendor on context discipline and CLAUDE.md.
The “lost-in-the-middle” literature on long-context attention degradation, motivating Component 4.
The Anthropic prompt-caching documentation — the actual price multipliers used in deck 04.

Where to go from here in this hub

CodingAgents 01 — Repo Understanding — deep dive on Component 1 retrieval mechanics (ripgrep, tree-sitter, ctags, embeddings).
CodingAgents 02 — Edit Strategies — what happens once context is loaded: search-replace, unified diff, multi-file atomicity.
Agents & Orchestration sub-hub — broader agent patterns, including LangGraph, CrewAI, AutoGen, OpenAI Agents.
Model Context Protocol sub-hub — how tools are getting standardised across harnesses.

A closing prompt

Pick one component you’ve been least intentional about in your own work. Spend an afternoon making it explicit — write a CLAUDE.md, profile your prompt, design a tool schema, write a clipping function. Watch what happens to the agent’s behaviour. The components are not abstractions; they are the levers that move the system. Pick one and pull it.

Back to Deck 03 — Components Overview