Companion deck for Component 6 of Sebastian Raschka’s essay, plus a closing synthesis tying all six components together. Why delegating to a child agent is harder than spawning one, and what good harness design looks like once everything is on the table.
Most coding tasks fit in a single agent loop. The interesting ones don’t. Three failure modes push designers towards subagents:
The task is genuinely big — touch 30 files, read 50 test failures, follow a multi-hop dependency chain. Even with aggressive compaction (deck 06) the parent agent runs out of room. Delegation hands off pieces with their own context budget.
Mid-task, the agent needs to know “what configures the test database?”. Reading config files takes 6 turns and 4 000 tokens, after which the parent is back where it started but with a polluted context. A subagent answers in one summary line.
Three files need the same kind of edit; tests need to run while a build runs. A parent + N subagents run in parallel, each on a fast independent chain, then the parent integrates results.
The article phrases the goal as: “parallelize certain work into subtasks via subagents and speed up the main task”. The two words doing the most work in that sentence are “certain” and “bounded”. Subagents are not a generic decomposition tool; they are a targeted lever for specific patterns the harness recognises.
The article makes a sharp point: “not just how to spawn a subagent but also how to bind one”. Spawning is the easy part: instantiate a child agent with a fresh context. Binding is the hard part: decide what context, tools, approval policy, sandbox, and budget the child gets — and what flows back.
A poorly-bound subagent is worse than no subagent. It either re-does work the parent already did (because too little context flowed in), or it explodes the system’s effective context (because too much flowed back), or it inherits write tools it didn’t need and corrupts state. The “spawn” API is one line of code. The binding policy is the design.
Two questions to answer for any subagent type: what flows in? and what flows out?
| Context layer | Typical inheritance | Why |
|---|---|---|
| Workspace summary | Always inherited | The child needs the same project context the parent used. |
| Tool definitions | Subset, not all | Read-only subagents shouldn’t have write_file in their schema at all. |
| Working memory | Snapshot at spawn time | The child uses it; doesn’t modify the parent’s. |
| Compact transcript | Usually not inherited | The child gets a focused task spec instead. Parent transcript is noise. |
| Task spec | Always | The single most important input: a one- to three-paragraph brief on what this child is for. |
| Approval policy | Inherited or tightened | Never relaxed below the parent’s; can be stricter (e.g., read-only). |
By far the most common return shape: a single string of 100–500 tokens. The child does many turns of work; the parent sees the result, not the work. This compresses rather than expands the parent’s context — the entire point of delegation.
Some subagents return more: a list of file paths, a JSON structure, a diff. Always schema this so the parent isn’t parsing free text. If the child wrote files, the side-effect is the result; the return value is just “done, here’s a summary”.
{
"name": "spawn_subagent",
"args": {
"role": "research", # selects bind profile
"task": "Identify which file configures the test database fixture, "
"and report its key options. Do not modify any files.",
"important_files": ["tests/conftest.py"], # inbound hint
"max_turns": 8, # budget
"return_schema": {
"type": "object",
"properties": {
"file": {"type": "string"},
"options": {"type": "object"},
"summary": {"type": "string", "maxLength": 500}
},
"required": ["file", "summary"]
}
}
}
Two safety properties harnesses get wrong if they aren’t designed in from the start.
If the parent runs in a Docker container or Git worktree, the child inherits the same sandbox by default. This is what Codex does explicitly: “subagents inherit sandbox and approval setup”. The child cannot widen the sandbox; the harness wires it that way.
Set a hard limit. Even depth 2 is unusual; depth 3+ is almost always a sign that the design is wrong (or the model is confused). A simple counter on the spawn tool prevents accidental fork-bombs.
If the depth-0 agent has 60 turns total, depth-1 children should each have a small fraction (say 8–12). The parent must afford to spawn several without going over its own budget. Concrete rule: child max-turns ≤ (parent remaining turns) / (expected fan-out + 2). If the parent doesn’t have the budget, it shouldn’t spawn.
Raschka contrasts the two tools the article centres on. Both expose subagents, but the design choices differ in instructive ways.
| Property | Claude Code | Codex CLI |
|---|---|---|
| Subagent maturity | Long-standing — the Agent / Task tool is part of the harness from early on |
Added more recently — subagent support layered on top of an established CLI |
| Default scope | Read-only by default; specialised agent types unlock specific writes | Often not read-only; subagents can write within the sandbox |
| Inheritance | Inherits workspace context; gets a focused task spec; isolated transcript | Inherits sandbox and approval setup; integration is closer to the parent |
| Concurrency model | Designed for parallel spawn (multiple agents at once) | Typically serial — one subagent at a time |
| Configuration surface | Custom agent types via Markdown definitions; declarative | Codex configuration files / flags; closer to programmatic |
The pattern this encourages: spawn a research subagent for a side question, an Explore subagent to map an unfamiliar area, a code-reviewer for a second opinion. Each child is cheap, scoped, and parallel-safe. The parent integrates summaries.
The pattern this encourages: hand off a self-contained slice of the task to a child that runs longer and may write within its sandbox. Parent gets the resulting branch state, not just a summary. Closer to a multi-process model than a multi-thread one.
The two designs reflect different bets about how teams use coding agents. Claude Code optimises for an interactive collaborator that asks for help mid-task; Codex CLI optimises for handing off bounded chunks of autonomous work. Treat them as different tools for different jobs, not as the same idea expressed differently.
Five subagent patterns that recur across well-designed harnesses. Each one is a specific bind profile — not a generic “spawn an agent and hope”.
Read-only tools, 5–10 turn budget, returns a 200-token summary. Used when the parent needs a fact (“which file configures X?”) without polluting its own context. Safe, cheap, parallel-friendly.
One child per file (or per directory), each with write access scoped to its target. Same task description for all (“rename FooBar to BarFoo”). Parent integrates by checking each returned diff.
A reviewer subagent receives a diff and produces a structured critique. No write tools. Often invoked after the parent thinks it’s done; the parent revises based on findings.
Codex-style. The parent hands off a contained task (“migrate this module to async”) to a child with full tool access in a fresh worktree. The child runs to completion; parent sees the resulting branch.
A reasoning-model planner produces a step-by-step plan; non-reasoning executor children run each step. Cheap fast model executes; expensive thinking model only plans. A tiered architecture.
Just spawn another agent and tell it to figure it out. No bind profile, no scoped task, no return contract. This is the failure mode — and unfortunately the easiest mistake to make. Always design the binding before you write the spawn call.
Each pattern produces a different shape of agent tree. Pick a pattern below to see how the parent hands off, what binding looks like, and how results merge back.
The tree shape is not a free parameter — it is determined by the kind of work. Research is one-deep, narrow. Parallel-edit is one-deep, wide. Autonomous slice is one-deep but each child is heavyweight. Planner/executor adds a planning hop. When you find yourself drawing a different shape, it usually corresponds to a different pattern.
One last pass over Raschka’s framework, with a single sentence per component capturing what we’ve learned across the series. Read it as a checklist for any harness you build or evaluate.
Build a stable, cheap workspace summary (layout + commands + conventions) so the agent inherits what a colleague would already know.
Sort the prompt by stability and put a cache breakpoint after the last stable layer; pay full price only for what genuinely changed this turn.
Replace prose suggestions with named, schema-checked actions; gate them with the four validation questions and a clear approval UX.
Clip individual tool outputs, dedupe repeated reads, summarise old turns with recency weighting; treat “what the model sees” as a budget.
Maintain a small structured working memory (task, files, decisions, next) on disk that survives compaction and carries across turns.
Delegate side questions and parallel slices to bounded children; the spawn API is one line, the binding policy is the entire design.
Pick the agent tool you use most. Write down what it does for each of the six components. Where it’s strong and where it’s weak should jump out. Now do the same for a harness you’d build — and notice which component you have the least concrete answer for. That is, almost always, where to start.
Raschka’s essay is, in the end, a quiet reframing of where the engineering is happening in modern LLM applications. The model is the engine, but the engine is also a commodity that improves on its own quarterly schedule. The harness is the durable artefact. The choices made about context, tools, memory, and delegation outlast any specific model version. Building good harnesses is the new applied software engineering of LLM systems — and the article is the clearest map I’ve seen of that territory.
Pick one component you’ve been least intentional about in your own work. Spend an afternoon making it explicit — write a CLAUDE.md, profile your prompt, design a tool schema, write a clipping function. Watch what happens to the agent’s behaviour. The components are not abstractions; they are the levers that move the system. Pick one and pull it.