SERIES: ANTHROPIC CLAUDE  |  07 of 10

Agents & Multi-Agent Systems

Autonomous AI That Plans, Acts & Collaborates

Building intelligent systems with Claude's agentic capabilities

agents orchestration multi-agent claude-sdk
FOUNDATIONS

What Are AI Agents?

An AI agent is a system that uses an LLM to autonomously decide which actions to take, execute those actions, observe the results, and iterate toward a goal.

Definition

  • Goal-directed autonomous systems
  • Use LLMs as reasoning engines
  • Can use tools & take actions
  • Observe outcomes & adapt

Autonomy Levels

  • L1: Human approves every step
  • L2: Human approves key decisions
  • L3: Agent acts, human monitors
  • L4: Fully autonomous execution

Agent vs Chatbot

  • Chatbot: single turn Q&A
  • Agent: multi-step task execution
  • Chatbot: no tool use
  • Agent: plans, acts, iterates
CapabilityChatbotAgent
Tool usageNone or limitedExtensive, dynamic tool selection
PlanningSingle responseMulti-step plan generation
IterationStateless per turnLoops until goal is achieved
Error recoveryUser must re-promptSelf-correcting behavior
CORE CONCEPT

The Agent Loop

Every agent follows a fundamental observe → think → act → iterate cycle until the task is complete or a stopping condition is met.

OBSERVE THINK ACT ITERATE Gather context, read environment Plan next step, select tools Execute tool call, make changes Check results, continue or stop goal_reached → stop
CLAUDE

Claude as an Agent

Claude is purpose-built for agentic workloads — its architecture and training make it one of the strongest foundation models for autonomous task execution.

Why Claude Excels

  • Extended thinking for complex reasoning chains
  • Tool use deeply integrated into the model
  • Strong instruction following & planning
  • Reliable structured output (JSON, XML)
  • Long context window for large codebases

Agentic Capabilities

  • Code execution — write, run, debug
  • File manipulation — read, edit, create
  • Search — grep, glob, web search
  • Shell access — bash commands
  • Multi-turn — maintains context across steps

Planning & Self-Correction

Claude naturally breaks down complex tasks into sub-tasks, validates intermediate results, backtracks when errors are detected, and provides reasoning transparency through extended thinking.

CONFIGURATION

agents.md — Your Agent's Identity File

The agents.md (or AGENTS.md) file tells Claude who it is, what it can do, and how it should behave within a project. It lives at the root of your repository.

# AGENTS.md

## Identity
You are a senior backend engineer assistant
specializing in Python microservices.

## Capabilities
- Read and modify files in src/
- Run tests with pytest
- Access the staging database (read-only)
- Create pull requests via GitHub CLI

## Guidelines
- Always run tests before committing
- Follow the project's PEP 8 style guide
- Never modify production configs
- Ask before deleting files

## Project Context
- Python 3.12 + FastAPI
- PostgreSQL with SQLAlchemy ORM
- Docker Compose for local dev

What It Is

A markdown file that provides Claude with persistent context about your project and its role

Purpose

  • Defines the agent's persona
  • Declares available capabilities
  • Sets behavioral boundaries
  • Provides project-specific context

How to Write One

  • Be specific about tools & permissions
  • Include project conventions
  • Define what the agent should NOT do
  • Keep it concise but comprehensive
DEEP DIVE

agents.md Structure

Identity & Role

  • Who the agent is
  • Domain expertise
  • Tone and communication style
  • Target audience

## Identity

Capabilities Declaration

  • Tools the agent may use
  • Files/directories accessible
  • APIs & services available
  • Permission boundaries

## Capabilities

Behavioral Guidelines

  • Coding conventions
  • Testing requirements
  • Things to avoid
  • Escalation rules

## Guidelines

Project Context

  • Tech stack & frameworks
  • Architecture overview
  • Directory structure
  • Build & deploy process

Restrictions & Safety

  • Files never to modify
  • Actions requiring approval
  • Sensitive data handling
  • Production safeguards

Claude Code automatically reads AGENTS.md, .claude/agents.md, and any AGENTS.md files in parent directories, merging them hierarchically.

SDK

Claude Agent SDK

The Anthropic Agent SDK provides a Python framework for building production-grade agents with Claude, including tool management, orchestration, and guardrails.

Overview

  • Open-source Python SDK
  • Built on top of the Anthropic API
  • First-class tool use support
  • Multi-agent orchestration built in
  • Streaming & async support

What It Enables

  • Declarative agent definitions
  • Typed tool schemas with validation
  • Agent handoffs & delegation
  • Guardrail integration
  • Tracing & observability

Installation

# Install the SDK
pip install anthropic-agents

# Or with extras
pip install anthropic-agents[all]

# Verify installation
python -c "import agents; print('OK')"

Key Components

  • Agent — core agent class
  • Runner — execution engine
  • Tool — tool definitions
  • Guardrail — safety checks
  • Handoff — agent delegation
CODE

Building Your First Agent

from agents import Agent, Runner, tool

@tool
def read_file(path: str) -> str:
    """Read contents of a file."""
    with open(path, 'r') as f:
        return f.read()

@tool
def write_file(path: str, content: str) -> str:
    """Write content to a file."""
    with open(path, 'w') as f:
        f.write(content)
    return f"Wrote {len(content)} bytes to {path}"

@tool
def run_tests() -> str:
    """Run the project test suite."""
    import subprocess
    result = subprocess.run(
        ["pytest", "--tb=short", "-q"],
        capture_output=True, text=True
    )
    return result.stdout + result.stderr

# Define the agent
coding_agent = Agent(
    name="CodingAssistant",
    instructions="""You are a Python developer.
    Read files, make changes, then run tests
    to verify your work.""",
    tools=[read_file, write_file, run_tests],
    model="claude-sonnet-4-20250514",
)

# Run the agent
result = Runner.run_sync(
    coding_agent,
    "Fix the bug in src/parser.py where
     empty input causes a crash"
)
print(result.final_output)

Step-by-Step

  • 1. Define tools with @tool decorator
  • 2. Create an Agent with instructions
  • 3. Attach tools to the agent
  • 4. Run with Runner.run_sync()

What Happens

The agent enters its loop: reads the file, identifies the bug, writes a fix, runs tests, and iterates until tests pass.

Key Features Used

  • Typed tool parameters
  • Auto-generated tool schemas
  • Agent loop with self-correction
  • Natural language instructions
TOOLS

Agent Tools

Tools give agents the ability to interact with the real world. Each tool has a name, description, input schema, and an execution function.

# Simple tool with the @tool decorator
@tool
def search_codebase(
    query: str,
    file_type: str = "py"
) -> str:
    """Search the codebase for a pattern.

    Args:
        query: Regex pattern to search for
        file_type: File extension filter
    """
    import subprocess
    result = subprocess.run(
        ["rg", query, "--type", file_type,
         "--json"],
        capture_output=True, text=True
    )
    return result.stdout

# Tool with complex schema
@tool
def create_github_issue(
    title: str,
    body: str,
    labels: list[str] = [],
    assignees: list[str] = []
) -> str:
    """Create a GitHub issue."""
    # Implementation here
    return f"Created issue: {title}"

Defining Custom Tools

  • Use @tool decorator for simplicity
  • Type hints become the JSON schema
  • Docstring becomes the description
  • Return value goes back to the agent

Tool Schema (auto-generated)

{
  "name": "search_codebase",
  "description": "Search the codebase...",
  "parameters": {
    "query": {"type": "string"},
    "file_type": {
      "type": "string",
      "default": "py"
    }
  }
}

Execution Flow

Agent decides to call tool → SDK validates inputs → function executes → result returned to agent → agent continues reasoning

ARCHITECTURE

Multi-Agent Architectures

Complex tasks benefit from multiple specialized agents working together. Three primary patterns dominate multi-agent design.

Orchestrator

Orchestrator Agent A Agent B Agent C

Central agent delegates subtasks to specialized workers

Pipeline

Stage 1 Stage 2 Stage 3 Stage 4

Each agent processes output from the previous stage sequentially

Peer-to-Peer

Agent A Agent B Agent C

Agents communicate directly, no central coordinator needed

PATTERNS

Agent Communication

Multi-agent systems need well-defined communication protocols to share information, delegate work, and coordinate effectively.

Message Passing

  • Structured messages between agents
  • Task descriptions with context
  • Results & status updates
  • Error reports & retries
result = await runner.handoff(
    target=review_agent,
    message="Review this PR",
    context={"diff": diff}
)

Shared Context

  • Common memory store
  • Shared file system access
  • Database for state persistence
  • Event logs all agents can read
ctx = SharedContext()
ctx.set("plan", plan_output)
# Other agents can read:
plan = ctx.get("plan")

Handoff Protocols

  • Explicit agent-to-agent delegation
  • Transfer of conversation control
  • Context preservation on handoff
  • Return results to caller
agent = Agent(
    name="Router",
    handoffs=[
        billing_agent,
        support_agent,
        technical_agent,
    ]
)
META

Claude Code as an Agent

Claude Code is itself a fully-featured AI agent — it uses the agent loop pattern with a rich set of built-in tools to accomplish software engineering tasks.

Built-in Agent Tools

ToolPurpose
ReadRead files from the filesystem
EditMake precise edits to files
WriteCreate new files
BashExecute shell commands
GlobFind files by pattern
GrepSearch file contents

How It Works

  • Observe: Reads files, searches codebase, checks git status
  • Think: Analyzes code, plans changes, considers edge cases
  • Act: Edits files, runs commands, creates commits
  • Iterate: Verifies changes, fixes errors, re-runs tests

Sub-agent Capability

Claude Code can spawn sub-agents for specialized tasks — a dedicated agent for exploration, another for planning, and task-specific workers. This enables parallel investigation and division of labor within a single session.

PATTERNS

Sub-agent Patterns

Sub-agents are specialized, scoped instances that handle specific aspects of a larger task. They inherit context but have focused instructions and limited tool access.

Explore Agent

  • Read-only access to codebase
  • Searches for patterns & dependencies
  • Maps architecture & data flow
  • Returns structured findings

Tools: Read, Glob, Grep

Plan Agent

  • Receives exploration results
  • Creates step-by-step implementation plan
  • Identifies risks & dependencies
  • Estimates complexity per step

Tools: Read (for validation)

Task Agent

  • Executes one step of the plan
  • Full read/write tool access
  • Reports success or failure
  • Scoped to specific files/modules

Tools: Read, Edit, Write, Bash

Delegation Pattern

explorer = Agent(name="Explorer", tools=[Read, Glob, Grep],
                 instructions="Find all usages of the deprecated API...")

planner = Agent(name="Planner", tools=[Read],
                instructions="Create a migration plan based on findings...")

worker = Agent(name="Worker", tools=[Read, Edit, Write, Bash],
               instructions="Execute step N of the migration plan...")

# Orchestrator delegates sequentially
findings = await Runner.run(explorer, "Map deprecated API usage")
plan = await Runner.run(planner, f"Plan migration: {findings.output}")
result = await Runner.run(worker, f"Execute: {plan.output}")
ORCHESTRATION

Agent Orchestration

Orchestration manages the lifecycle, coordination, and execution of multiple agents working toward a shared objective.

Managing Multiple Agents

from agents import Agent, Runner

orchestrator = Agent(
    name="ProjectManager",
    instructions="""You coordinate a team:
    - frontend_agent: React components
    - backend_agent: API endpoints
    - test_agent: Integration tests
    Delegate tasks appropriately.""",
    handoffs=[
        frontend_agent,
        backend_agent,
        test_agent,
    ]
)

# The orchestrator decides who to call
result = await Runner.run(
    orchestrator,
    "Add a user profile page with API"
)

Parallel Execution

  • Independent tasks run simultaneously
  • Reduces total wall-clock time
  • Results aggregated by orchestrator
  • Failures isolated per agent
import asyncio
tasks = [
    Runner.run(lint_agent, code),
    Runner.run(test_agent, code),
    Runner.run(docs_agent, code),
]
results = await asyncio.gather(*tasks)

Coordination Strategies

  • Sequential: A → B → C (pipeline)
  • Parallel: A | B | C (fan-out)
  • Conditional: If A fails, try B
  • Iterative: Repeat until quality bar met
SAFETY

Guardrails & Safety

Autonomous agents require robust safety measures. Guardrails prevent harmful actions, enforce policies, and maintain human oversight.

Permission Systems

  • Tool-level access control
  • File/directory allow-lists
  • Network access restrictions
  • Tiered permission levels
  • Principle of least privilege

Sandboxing

  • Docker containers for execution
  • Read-only filesystem mounts
  • Network isolation
  • Resource limits (CPU, memory)
  • Timeout enforcement

Human-in-the-Loop

  • Approval for destructive actions
  • Review before git push
  • Confirmation for API calls
  • Escalation on uncertainty
  • Audit trail for all actions

Guardrail Implementation

from agents import Agent, Guardrail, GuardrailResult

@guardrail
async def no_secrets_guardrail(ctx, agent, input_text) -> GuardrailResult:
    """Prevent the agent from outputting secrets or credentials."""
    patterns = [r"sk-[a-zA-Z0-9]{48}", r"AKIA[0-9A-Z]{16}", r"ghp_[a-zA-Z0-9]{36}"]
    for pattern in patterns:
        if re.search(pattern, input_text):
            return GuardrailResult(should_block=True, message="Blocked: potential secret detected")
    return GuardrailResult(should_block=False)

agent = Agent(name="SafeAgent", guardrails=[no_secrets_guardrail])
STATE

Agent Memory & State

Effective agents maintain context across interactions. Memory systems range from simple conversation history to persistent knowledge bases.

Conversation History

  • Full message log per session
  • Tool calls & results preserved
  • Summarization for long sessions
  • Context window management

Persistent Memory

  • Cross-session knowledge store
  • User preferences & patterns
  • Project-specific learnings
  • Claude Code: ~/.claude/memory

Context Management

  • Prioritize recent & relevant info
  • Evict stale context intelligently
  • Compress repeated patterns
  • Key-value scratchpad for state

Memory Architecture

# Working memory (within session)
agent.context["current_task"] = task
agent.context["findings"] = []

# Persistent memory (across sessions)
memory_store = VectorMemory(
    path="./agent_memory",
    embedding_model="voyage-3"
)
memory_store.add("User prefers tabs")
relevant = memory_store.search(query)

Common Pitfalls

  • Context overflow — exceeding token limits causes lost information
  • Stale state — outdated memory leads to wrong decisions
  • No summarization — raw history is expensive
  • Missing persistence — losing learnings between sessions
EXAMPLES

Real-world Agent Examples

Code Review Bot

  • Triggered on PR creation
  • Reads diff & full file context
  • Checks style, logic, security
  • Posts inline review comments
  • Suggests specific fixes

Tools: GitHub API, Read, Grep

Model: Claude Sonnet (cost-effective)

Data Pipeline Agent

  • Monitors data quality metrics
  • Detects schema changes
  • Auto-generates transformations
  • Runs validation queries
  • Alerts on anomalies

Tools: SQL, Python, Slack API

Model: Claude Haiku (low-latency)

Customer Support

  • Triages incoming tickets
  • Searches knowledge base
  • Drafts responses for review
  • Escalates complex issues
  • Learns from past resolutions

Tools: KB Search, CRM, Email

Model: Claude Sonnet + handoffs

Common Architecture

All three follow the same pattern: trigger eventgather contextreason about actionexecute with toolsverify & report. The difference is in the tools available and the domain-specific instructions.

DEBUG

Debugging Agents

Agents are harder to debug than traditional software because behavior emerges from LLM reasoning. Observability is the key to understanding agent decisions.

Observability & Logging

from agents import Agent, Runner
from agents.tracing import TracingConfig

# Enable detailed tracing
config = TracingConfig(
    log_level="DEBUG",
    trace_tools=True,
    trace_reasoning=True,
    output_dir="./traces"
)

result = await Runner.run(
    agent, prompt,
    tracing=config
)

# Inspect the trace
for step in result.trace.steps:
    print(f"[{step.type}] {step.summary}")
    if step.type == "tool_call":
        print(f"  Tool: {step.tool_name}")
        print(f"  Input: {step.input}")
        print(f"  Output: {step.output[:200]}")
    elif step.type == "reasoning":
        print(f"  Thought: {step.text[:200]}")

What to Look For

  • Tool selection errors — wrong tool for the job
  • Input malformation — bad parameters to tools
  • Reasoning loops — agent repeating the same action
  • Context loss — forgetting earlier findings
  • Premature stopping — declaring done too early

Tracing Agent Decisions

  • Log every LLM call with full prompt
  • Record tool inputs & outputs
  • Track token usage per step
  • Measure latency per action
  • Compare traces between runs

Common Failure Modes

  • Infinite loops on error recovery
  • Hallucinating file paths or APIs
  • Misinterpreting tool output format
  • Overly broad tool calls (e.g., rm -rf)
PERFORMANCE

Agent Performance

Agent systems involve tradeoffs between latency, cost, and accuracy. Understanding these tradeoffs is critical for production deployments.

Latency Optimization

  • Use faster models for simple steps
  • Parallelize independent tool calls
  • Cache repeated tool results
  • Minimize context window size
  • Stream responses to users

Cost Management

  • Route to Haiku for triage tasks
  • Use Sonnet for most agent work
  • Reserve Opus for hard reasoning
  • Summarize context to reduce tokens
  • Set max iteration limits

Accuracy Tradeoffs

  • More iterations = higher accuracy
  • Bigger model = better reasoning
  • More context = better decisions
  • Verification steps add cost but quality
  • Self-consistency checks improve reliability
StrategyLatency ImpactCost ImpactAccuracy Impact
Model routing (Haiku/Sonnet/Opus)-40% latency-60% cost-10% accuracy
Parallel tool execution-50% latencyNeutralNeutral
Context summarization-20% latency-30% cost-5% accuracy
Verification loop+30% latency+25% cost+15% accuracy
Extended thinking+20% latency+15% cost+20% accuracy
ENTERPRISE

Enterprise Agent Deployment

Moving agents from prototype to production requires attention to scaling, monitoring, and compliance.

Scaling

  • Horizontal scaling with worker pools
  • Queue-based task distribution
  • Rate limiting per user/team
  • Graceful degradation under load
  • Auto-scaling based on queue depth

Monitoring

  • Dashboard for agent health metrics
  • Success/failure rate tracking
  • Token usage & cost attribution
  • Latency percentiles (p50, p95, p99)
  • Alerting on anomalous behavior

Compliance

  • Audit logs for all agent actions
  • Data residency requirements
  • PII detection & redaction
  • Role-based access controls
  • SOC 2 / HIPAA compliance paths

Infrastructure Pattern

# Agent deployment architecture
services:
  agent-gateway:
    image: agent-gateway:latest
    replicas: 3
    environment:
      - ANTHROPIC_API_KEY=${API_KEY}
      - MAX_CONCURRENT_AGENTS=50
  agent-worker:
    image: agent-worker:latest
    replicas: 10
    resources:
      limits: { memory: "2Gi", cpu: "1" }
  redis:
    image: redis:7-alpine  # Task queue

Production Checklist

  • ☐ Rate limits & circuit breakers
  • ☐ Timeout on all agent runs
  • ☐ Fallback behavior on API errors
  • ☐ Cost alerting per user/team
  • ☐ Secrets management (no hardcoded keys)
  • ☐ Sandbox all tool executions
  • ☐ Human review for sensitive actions
  • ☐ Retention policy for traces/logs
FUTURE

The Future of Agents

Agent capabilities are expanding rapidly. The next wave brings more autonomy, richer modalities, and collaborative agent ecosystems.

Autonomous Coding

  • End-to-end feature implementation
  • Agents that write & maintain tests
  • Self-reviewing pull requests
  • Automated bug triage & fixing
  • Continuous codebase improvement

Already emerging with Claude Code

Multi-Modal Agents

  • Vision: screenshot analysis, UI testing
  • Audio: voice-driven agent control
  • Video: monitoring & analysis
  • Combined modalities for richer understanding
  • Computer use & browser interaction

Claude's vision capabilities enable this today

Agent Swarms

  • Dozens of agents collaborating
  • Emergent problem-solving behavior
  • Specialized agent marketplaces
  • Cross-organization agent coordination
  • Self-organizing agent teams

Research frontier, early experiments underway

Key Trends to Watch

Longer autonomy windows — agents running for hours, not minutes. Better tool ecosystems — standardized tool interfaces (MCP, now under the Linux Foundation's Agentic AI Foundation). Agent-to-agent protocols — standardized communication. Formal verification — proving agent safety properties mathematically.

SUMMARY

Summary & Best Practices

Key Takeaways

  • Agents are LLMs + tools + loops
  • The observe-think-act cycle is fundamental
  • agents.md defines agent identity & behavior
  • The Claude Agent SDK enables production agents
  • Multi-agent systems unlock complex workflows
  • Safety guardrails are non-negotiable

Design Principles

  • Start simple — single agent first
  • Scope narrowly — clear tool boundaries
  • Fail gracefully — always have fallbacks
  • Log everything — observability first
  • Test iteratively — eval-driven development

Best Practices

  • Write clear, specific agent instructions
  • Use the right model tier for each task
  • Implement human-in-the-loop for high-stakes actions
  • Set iteration limits to prevent runaway costs
  • Version your agents.md alongside code
  • Monitor token usage and set budgets

Anti-patterns to Avoid

  • God agent — one agent that does everything
  • No guardrails — unrestricted tool access
  • Vague instructions — ambiguous agent prompts
  • No observability — can't debug what you can't see
  • Premature multi-agent — complexity without need

Next: 08 — Model Context Protocol (MCP)