Coding Agents Internals Series — Presentation 07

Bounded Subagents & Synthesis

Companion deck for Component 6 of Sebastian Raschka’s essay, plus a closing synthesis tying all six components together. Why delegating to a child agent is harder than spawning one, and what good harness design looks like once everything is on the table.

Subagents Binding Sandboxing Recursion depth Claude Code Codex Synthesis
Main agent Decide to delegate Bind subagent Sandboxed run Return summary Resume parent
00

Topics We’ll Cover

01

Why Delegate at All?

Most coding tasks fit in a single agent loop. The interesting ones don’t. Three failure modes push designers towards subagents:

Context overflow

The task is genuinely big — touch 30 files, read 50 test failures, follow a multi-hop dependency chain. Even with aggressive compaction (deck 06) the parent agent runs out of room. Delegation hands off pieces with their own context budget.

Side questions

Mid-task, the agent needs to know “what configures the test database?”. Reading config files takes 6 turns and 4 000 tokens, after which the parent is back where it started but with a polluted context. A subagent answers in one summary line.

Parallelisable work

Three files need the same kind of edit; tests need to run while a build runs. A parent + N subagents run in parallel, each on a fast independent chain, then the parent integrates results.

Raschka’s framing

The article phrases the goal as: “parallelize certain work into subtasks via subagents and speed up the main task”. The two words doing the most work in that sentence are “certain” and “bounded”. Subagents are not a generic decomposition tool; they are a targeted lever for specific patterns the harness recognises.

02

Spawn vs Bind — The Distinction That Matters

The article makes a sharp point: “not just how to spawn a subagent but also how to bind one”. Spawning is the easy part: instantiate a child agent with a fresh context. Binding is the hard part: decide what context, tools, approval policy, sandbox, and budget the child gets — and what flows back.

Parent agent full context, tools, memory approval policy = confirm-novel recursion depth = 0 turn budget = 60 Subagent (research) read-only tools · no shell · no write scoped task: “find auth config” depth = 1, budget = 8 turns Subagent (parallel-edit) scoped to src/handlers/*.py approval inherits from parent depth = 1, budget = 12 turns bind summary back

What “binding” encompasses

Why “bind” matters more than “spawn”

A poorly-bound subagent is worse than no subagent. It either re-does work the parent already did (because too little context flowed in), or it explodes the system’s effective context (because too much flowed back), or it inherits write tools it didn’t need and corrupts state. The “spawn” API is one line of code. The binding policy is the design.

03

What Context Crosses the Boundary

Two questions to answer for any subagent type: what flows in? and what flows out?

Inbound — what the child sees

Context layerTypical inheritanceWhy
Workspace summaryAlways inheritedThe child needs the same project context the parent used.
Tool definitionsSubset, not allRead-only subagents shouldn’t have write_file in their schema at all.
Working memorySnapshot at spawn timeThe child uses it; doesn’t modify the parent’s.
Compact transcriptUsually not inheritedThe child gets a focused task spec instead. Parent transcript is noise.
Task specAlwaysThe single most important input: a one- to three-paragraph brief on what this child is for.
Approval policyInherited or tightenedNever relaxed below the parent’s; can be stricter (e.g., read-only).

Outbound — what comes back

The summary contract

By far the most common return shape: a single string of 100–500 tokens. The child does many turns of work; the parent sees the result, not the work. This compresses rather than expands the parent’s context — the entire point of delegation.

Structured artefacts

Some subagents return more: a list of file paths, a JSON structure, a diff. Always schema this so the parent isn’t parsing free text. If the child wrote files, the side-effect is the result; the return value is just “done, here’s a summary”.

A subagent invocation contract
{
  "name": "spawn_subagent",
  "args": {
    "role": "research",                            # selects bind profile
    "task": "Identify which file configures the test database fixture, "
            "and report its key options. Do not modify any files.",
    "important_files": ["tests/conftest.py"],          # inbound hint
    "max_turns": 8,                                  # budget
    "return_schema": {
      "type": "object",
      "properties": {
        "file":    {"type": "string"},
        "options": {"type": "object"},
        "summary": {"type": "string", "maxLength": 500}
      },
      "required": ["file", "summary"]
    }
  }
}
04

Sandboxing & Recursion Depth

Two safety properties harnesses get wrong if they aren’t designed in from the start.

Sandbox inheritance

If the parent runs in a Docker container or Git worktree, the child inherits the same sandbox by default. This is what Codex does explicitly: “subagents inherit sandbox and approval setup”. The child cannot widen the sandbox; the harness wires it that way.

Recursion depth

Set a hard limit. Even depth 2 is unusual; depth 3+ is almost always a sign that the design is wrong (or the model is confused). A simple counter on the spawn tool prevents accidental fork-bombs.

depth=0 — main user-facing agent (turn budget 60, full tool surface)
↓ spawn
depth=1 — e.g. research subagent (turn budget 8, read-only)
↓ spawn
depth=2 — rarely needed; sometimes for nested file-search
↓ blocked
depth=3 — HARD STOP. Spawn tool refuses with error.

Why bound depth aggressively

Token budgets cascade

If the depth-0 agent has 60 turns total, depth-1 children should each have a small fraction (say 8–12). The parent must afford to spawn several without going over its own budget. Concrete rule: child max-turns (parent remaining turns) / (expected fan-out + 2). If the parent doesn’t have the budget, it shouldn’t spawn.

05

Claude Code vs Codex CLI — Two Subagent Models

Raschka contrasts the two tools the article centres on. Both expose subagents, but the design choices differ in instructive ways.

PropertyClaude CodeCodex CLI
Subagent maturity Long-standing — the Agent / Task tool is part of the harness from early on Added more recently — subagent support layered on top of an established CLI
Default scope Read-only by default; specialised agent types unlock specific writes Often not read-only; subagents can write within the sandbox
Inheritance Inherits workspace context; gets a focused task spec; isolated transcript Inherits sandbox and approval setup; integration is closer to the parent
Concurrency model Designed for parallel spawn (multiple agents at once) Typically serial — one subagent at a time
Configuration surface Custom agent types via Markdown definitions; declarative Codex configuration files / flags; closer to programmatic

Claude Code — many small read-only specialists

The pattern this encourages: spawn a research subagent for a side question, an Explore subagent to map an unfamiliar area, a code-reviewer for a second opinion. Each child is cheap, scoped, and parallel-safe. The parent integrates summaries.

Codex CLI — deeper, more autonomous children

The pattern this encourages: hand off a self-contained slice of the task to a child that runs longer and may write within its sandbox. Parent gets the resulting branch state, not just a summary. Closer to a multi-process model than a multi-thread one.

Neither is “right”

The two designs reflect different bets about how teams use coding agents. Claude Code optimises for an interactive collaborator that asks for help mid-task; Codex CLI optimises for handing off bounded chunks of autonomous work. Treat them as different tools for different jobs, not as the same idea expressed differently.

06

A Pattern Catalogue You Can Steal

Five subagent patterns that recur across well-designed harnesses. Each one is a specific bind profile — not a generic “spawn an agent and hope”.

Pattern A — research / side-question

Read-only tools, 5–10 turn budget, returns a 200-token summary. Used when the parent needs a fact (“which file configures X?”) without polluting its own context. Safe, cheap, parallel-friendly.

Pattern B — parallel-edit fan-out

One child per file (or per directory), each with write access scoped to its target. Same task description for all (“rename FooBar to BarFoo”). Parent integrates by checking each returned diff.

Pattern C — second-opinion review

A reviewer subagent receives a diff and produces a structured critique. No write tools. Often invoked after the parent thinks it’s done; the parent revises based on findings.

Pattern D — long-running autonomous slice

Codex-style. The parent hands off a contained task (“migrate this module to async”) to a child with full tool access in a fresh worktree. The child runs to completion; parent sees the resulting branch.

Pattern E — planner / executor split

A reasoning-model planner produces a step-by-step plan; non-reasoning executor children run each step. Cheap fast model executes; expensive thinking model only plans. A tiered architecture.

Pattern F — the anti-pattern

Just spawn another agent and tell it to figure it out. No bind profile, no scoped task, no return contract. This is the failure mode — and unfortunately the easiest mistake to make. Always design the binding before you write the spawn call.

07

Interactive: The Subagent Tree

Each pattern produces a different shape of agent tree. Pick a pattern below to see how the parent hands off, what binding looks like, and how results merge back.

What to notice

The tree shape is not a free parameter — it is determined by the kind of work. Research is one-deep, narrow. Parallel-edit is one-deep, wide. Autonomous slice is one-deep but each child is heavyweight. Planner/executor adds a planning hop. When you find yourself drawing a different shape, it usually corresponds to a different pattern.

08

Synthesis — All Six Components Together

One last pass over Raschka’s framework, with a single sentence per component capturing what we’ve learned across the series. Read it as a checklist for any harness you build or evaluate.

COMPONENT 01

Live Repo Context

Build a stable, cheap workspace summary (layout + commands + conventions) so the agent inherits what a colleague would already know.

COMPONENT 02

Prompt Shape & Cache

Sort the prompt by stability and put a cache breakpoint after the last stable layer; pay full price only for what genuinely changed this turn.

COMPONENT 03

Tool Access

Replace prose suggestions with named, schema-checked actions; gate them with the four validation questions and a clear approval UX.

COMPONENT 04

Context Bloat

Clip individual tool outputs, dedupe repeated reads, summarise old turns with recency weighting; treat “what the model sees” as a budget.

COMPONENT 05

Session Memory

Maintain a small structured working memory (task, files, decisions, next) on disk that survives compaction and carries across turns.

COMPONENT 06

Subagents

Delegate side questions and parallel slices to bounded children; the spawn API is one line, the binding policy is the entire design.

A useful test for any harness

Pick the agent tool you use most. Write down what it does for each of the six components. Where it’s strong and where it’s weak should jump out. Now do the same for a harness you’d build — and notice which component you have the least concrete answer for. That is, almost always, where to start.

The deeper claim, restated

Raschka’s essay is, in the end, a quiet reframing of where the engineering is happening in modern LLM applications. The model is the engine, but the engine is also a commodity that improves on its own quarterly schedule. The harness is the durable artefact. The choices made about context, tools, memory, and delegation outlast any specific model version. Building good harnesses is the new applied software engineering of LLM systems — and the article is the clearest map I’ve seen of that territory.

09

Reading List & Where to Go Next

The article itself

Production harness sources to study

Adjacent essays and papers

Where to go from here in this hub

A closing prompt

Pick one component you’ve been least intentional about in your own work. Spend an afternoon making it explicit — write a CLAUDE.md, profile your prompt, design a tool schema, write a clipping function. Watch what happens to the agent’s behaviour. The components are not abstractions; they are the levers that move the system. Pick one and pull it.

Back to Deck 03 — Components Overview