SERIES: ANTHROPIC CLAUDE | 07 of 10

Agents & Multi-Agent Systems

Autonomous AI That Plans, Acts & Collaborates

Building intelligent systems with Claude's agentic capabilities

agents orchestration multi-agent claude-sdk

FOUNDATIONS

What Are AI Agents?

An AI agent is a system that uses an LLM to autonomously decide which actions to take, execute those actions, observe the results, and iterate toward a goal.

Definition

Goal-directed autonomous systems
Use LLMs as reasoning engines
Can use tools & take actions
Observe outcomes & adapt

Autonomy Levels

L1: Human approves every step
L2: Human approves key decisions
L3: Agent acts, human monitors
L4: Fully autonomous execution

Agent vs Chatbot

Chatbot: single turn Q&A
Agent: multi-step task execution
Chatbot: no tool use
Agent: plans, acts, iterates

Capability	Chatbot	Agent
Tool usage	None or limited	Extensive, dynamic tool selection
Planning	Single response	Multi-step plan generation
Iteration	Stateless per turn	Loops until goal is achieved
Error recovery	User must re-prompt	Self-correcting behavior

CORE CONCEPT

The Agent Loop

Every agent follows a fundamental observe → think → act → iterate cycle until the task is complete or a stopping condition is met.

CLAUDE

Claude as an Agent

Claude is purpose-built for agentic workloads — its architecture and training make it one of the strongest foundation models for autonomous task execution.

Why Claude Excels

Extended thinking for complex reasoning chains
Tool use deeply integrated into the model
Strong instruction following & planning
Reliable structured output (JSON, XML)
Long context window for large codebases

Agentic Capabilities

Code execution — write, run, debug
File manipulation — read, edit, create
Search — grep, glob, web search
Shell access — bash commands
Multi-turn — maintains context across steps

Planning & Self-Correction

Claude naturally breaks down complex tasks into sub-tasks, validates intermediate results, backtracks when errors are detected, and provides reasoning transparency through extended thinking.

CONFIGURATION

agents.md — Your Agent's Identity File

The agents.md (or AGENTS.md) file tells Claude who it is, what it can do, and how it should behave within a project. It lives at the root of your repository.

# AGENTS.md

## Identity
You are a senior backend engineer assistant
specializing in Python microservices.

## Capabilities
- Read and modify files in src/
- Run tests with pytest
- Access the staging database (read-only)
- Create pull requests via GitHub CLI

## Guidelines
- Always run tests before committing
- Follow the project's PEP 8 style guide
- Never modify production configs
- Ask before deleting files

## Project Context
- Python 3.12 + FastAPI
- PostgreSQL with SQLAlchemy ORM
- Docker Compose for local dev

What It Is

A markdown file that provides Claude with persistent context about your project and its role

Purpose

Defines the agent's persona
Declares available capabilities
Sets behavioral boundaries
Provides project-specific context

How to Write One

Be specific about tools & permissions
Include project conventions
Define what the agent should NOT do
Keep it concise but comprehensive

DEEP DIVE

agents.md Structure

Identity & Role

Who the agent is
Domain expertise
Tone and communication style
Target audience

## Identity

Capabilities Declaration

Tools the agent may use
Files/directories accessible
APIs & services available
Permission boundaries

## Capabilities

Behavioral Guidelines

Coding conventions
Testing requirements
Things to avoid
Escalation rules

## Guidelines

Project Context

Tech stack & frameworks
Architecture overview
Directory structure
Build & deploy process

Restrictions & Safety

Files never to modify
Actions requiring approval
Sensitive data handling
Production safeguards

Claude Code automatically reads AGENTS.md, .claude/agents.md, and any AGENTS.md files in parent directories, merging them hierarchically.

SDK

Claude Agent SDK

The Anthropic Agent SDK provides a Python framework for building production-grade agents with Claude, including tool management, orchestration, and guardrails.

Overview

Open-source Python SDK
Built on top of the Anthropic API
First-class tool use support
Multi-agent orchestration built in
Streaming & async support

What It Enables

Declarative agent definitions
Typed tool schemas with validation
Agent handoffs & delegation
Guardrail integration
Tracing & observability

Installation

# Install the SDK
pip install anthropic-agents

# Or with extras
pip install anthropic-agents[all]

# Verify installation
python -c "import agents; print('OK')"

Key Components

Agent — core agent class
Runner — execution engine
Tool — tool definitions
Guardrail — safety checks
Handoff — agent delegation

CODE

Building Your First Agent

from agents import Agent, Runner, tool

@tool
def read_file(path: str) -> str:
    """Read contents of a file."""
    with open(path, 'r') as f:
        return f.read()

@tool
def write_file(path: str, content: str) -> str:
    """Write content to a file."""
    with open(path, 'w') as f:
        f.write(content)
    return f"Wrote {len(content)} bytes to {path}"

@tool
def run_tests() -> str:
    """Run the project test suite."""
    import subprocess
    result = subprocess.run(
        ["pytest", "--tb=short", "-q"],
        capture_output=True, text=True
    )
    return result.stdout + result.stderr

# Define the agent
coding_agent = Agent(
    name="CodingAssistant",
    instructions="""You are a Python developer.
    Read files, make changes, then run tests
    to verify your work.""",
    tools=[read_file, write_file, run_tests],
    model="claude-sonnet-4-20250514",
)

# Run the agent
result = Runner.run_sync(
    coding_agent,
    "Fix the bug in src/parser.py where
     empty input causes a crash"
)
print(result.final_output)

Step-by-Step

1. Define tools with @tool decorator
2. Create an Agent with instructions
3. Attach tools to the agent
4. Run with Runner.run_sync()

What Happens

The agent enters its loop: reads the file, identifies the bug, writes a fix, runs tests, and iterates until tests pass.

Key Features Used

Typed tool parameters
Auto-generated tool schemas
Agent loop with self-correction
Natural language instructions

TOOLS

Agent Tools

Tools give agents the ability to interact with the real world. Each tool has a name, description, input schema, and an execution function.

# Simple tool with the @tool decorator
@tool
def search_codebase(
    query: str,
    file_type: str = "py"
) -> str:
    """Search the codebase for a pattern.

    Args:
        query: Regex pattern to search for
        file_type: File extension filter
    """
    import subprocess
    result = subprocess.run(
        ["rg", query, "--type", file_type,
         "--json"],
        capture_output=True, text=True
    )
    return result.stdout

# Tool with complex schema
@tool
def create_github_issue(
    title: str,
    body: str,
    labels: list[str] = [],
    assignees: list[str] = []
) -> str:
    """Create a GitHub issue."""
    # Implementation here
    return f"Created issue: {title}"

Defining Custom Tools

Use @tool decorator for simplicity
Type hints become the JSON schema
Docstring becomes the description
Return value goes back to the agent

Tool Schema (auto-generated)

{
  "name": "search_codebase",
  "description": "Search the codebase...",
  "parameters": {
    "query": {"type": "string"},
    "file_type": {
      "type": "string",
      "default": "py"
    }
  }
}

Execution Flow

Agent decides to call tool → SDK validates inputs → function executes → result returned to agent → agent continues reasoning

ARCHITECTURE

Multi-Agent Architectures

Complex tasks benefit from multiple specialized agents working together. Three primary patterns dominate multi-agent design.

Orchestrator

Central agent delegates subtasks to specialized workers

Pipeline

Each agent processes output from the previous stage sequentially

Peer-to-Peer

Agents communicate directly, no central coordinator needed

PATTERNS

Agent Communication

Multi-agent systems need well-defined communication protocols to share information, delegate work, and coordinate effectively.

Message Passing

Structured messages between agents
Task descriptions with context
Results & status updates
Error reports & retries

result = await runner.handoff(
    target=review_agent,
    message="Review this PR",
    context={"diff": diff}
)

Shared Context

Common memory store
Shared file system access
Database for state persistence
Event logs all agents can read

ctx = SharedContext()
ctx.set("plan", plan_output)
# Other agents can read:
plan = ctx.get("plan")

Handoff Protocols

Explicit agent-to-agent delegation
Transfer of conversation control
Context preservation on handoff
Return results to caller

agent = Agent(
    name="Router",
    handoffs=[
        billing_agent,
        support_agent,
        technical_agent,
    ]
)

Claude Code as an Agent

Claude Code is itself a fully-featured AI agent — it uses the agent loop pattern with a rich set of built-in tools to accomplish software engineering tasks.

Built-in Agent Tools

Tool	Purpose
`Read`	Read files from the filesystem
`Edit`	Make precise edits to files
`Write`	Create new files
`Bash`	Execute shell commands
`Glob`	Find files by pattern
`Grep`	Search file contents

How It Works

Observe: Reads files, searches codebase, checks git status
Think: Analyzes code, plans changes, considers edge cases
Act: Edits files, runs commands, creates commits
Iterate: Verifies changes, fixes errors, re-runs tests

Sub-agent Capability

Claude Code can spawn sub-agents for specialized tasks — a dedicated agent for exploration, another for planning, and task-specific workers. This enables parallel investigation and division of labor within a single session.

PATTERNS

Sub-agent Patterns

Sub-agents are specialized, scoped instances that handle specific aspects of a larger task. They inherit context but have focused instructions and limited tool access.

Explore Agent

Read-only access to codebase
Searches for patterns & dependencies
Maps architecture & data flow
Returns structured findings

Tools: Read, Glob, Grep

Plan Agent

Receives exploration results
Creates step-by-step implementation plan
Identifies risks & dependencies
Estimates complexity per step

Tools: Read (for validation)

Task Agent

Executes one step of the plan
Full read/write tool access
Reports success or failure
Scoped to specific files/modules

Tools: Read, Edit, Write, Bash

Delegation Pattern

explorer = Agent(name="Explorer", tools=[Read, Glob, Grep],
                 instructions="Find all usages of the deprecated API...")

planner = Agent(name="Planner", tools=[Read],
                instructions="Create a migration plan based on findings...")

worker = Agent(name="Worker", tools=[Read, Edit, Write, Bash],
               instructions="Execute step N of the migration plan...")

# Orchestrator delegates sequentially
findings = await Runner.run(explorer, "Map deprecated API usage")
plan = await Runner.run(planner, f"Plan migration: {findings.output}")
result = await Runner.run(worker, f"Execute: {plan.output}")

ORCHESTRATION

Agent Orchestration

Orchestration manages the lifecycle, coordination, and execution of multiple agents working toward a shared objective.

Managing Multiple Agents

from agents import Agent, Runner

orchestrator = Agent(
    name="ProjectManager",
    instructions="""You coordinate a team:
    - frontend_agent: React components
    - backend_agent: API endpoints
    - test_agent: Integration tests
    Delegate tasks appropriately.""",
    handoffs=[
        frontend_agent,
        backend_agent,
        test_agent,
    ]
)

# The orchestrator decides who to call
result = await Runner.run(
    orchestrator,
    "Add a user profile page with API"
)

Parallel Execution

Independent tasks run simultaneously
Reduces total wall-clock time
Results aggregated by orchestrator
Failures isolated per agent

import asyncio
tasks = [
    Runner.run(lint_agent, code),
    Runner.run(test_agent, code),
    Runner.run(docs_agent, code),
]
results = await asyncio.gather(*tasks)

Coordination Strategies

Sequential: A → B → C (pipeline)
Parallel: A | B | C (fan-out)
Conditional: If A fails, try B
Iterative: Repeat until quality bar met

SAFETY

Guardrails & Safety

Autonomous agents require robust safety measures. Guardrails prevent harmful actions, enforce policies, and maintain human oversight.

Permission Systems

Tool-level access control
File/directory allow-lists
Network access restrictions
Tiered permission levels
Principle of least privilege

Sandboxing

Docker containers for execution
Read-only filesystem mounts
Network isolation
Resource limits (CPU, memory)
Timeout enforcement

Human-in-the-Loop

Approval for destructive actions
Review before git push
Confirmation for API calls
Escalation on uncertainty
Audit trail for all actions

Guardrail Implementation

from agents import Agent, Guardrail, GuardrailResult

@guardrail
async def no_secrets_guardrail(ctx, agent, input_text) -> GuardrailResult:
    """Prevent the agent from outputting secrets or credentials."""
    patterns = [r"sk-[a-zA-Z0-9]{48}", r"AKIA[0-9A-Z]{16}", r"ghp_[a-zA-Z0-9]{36}"]
    for pattern in patterns:
        if re.search(pattern, input_text):
            return GuardrailResult(should_block=True, message="Blocked: potential secret detected")
    return GuardrailResult(should_block=False)

agent = Agent(name="SafeAgent", guardrails=[no_secrets_guardrail])

STATE

Agent Memory & State

Effective agents maintain context across interactions. Memory systems range from simple conversation history to persistent knowledge bases.

Conversation History

Full message log per session
Tool calls & results preserved
Summarization for long sessions
Context window management

Persistent Memory

Cross-session knowledge store
User preferences & patterns
Project-specific learnings
Claude Code: ~/.claude/memory

Context Management

Prioritize recent & relevant info
Evict stale context intelligently
Compress repeated patterns
Key-value scratchpad for state

Memory Architecture

# Working memory (within session)
agent.context["current_task"] = task
agent.context["findings"] = []

# Persistent memory (across sessions)
memory_store = VectorMemory(
    path="./agent_memory",
    embedding_model="voyage-3"
)
memory_store.add("User prefers tabs")
relevant = memory_store.search(query)

Common Pitfalls

Context overflow — exceeding token limits causes lost information
Stale state — outdated memory leads to wrong decisions
No summarization — raw history is expensive
Missing persistence — losing learnings between sessions

EXAMPLES

Real-world Agent Examples

Code Review Bot

Triggered on PR creation
Reads diff & full file context
Checks style, logic, security
Posts inline review comments
Suggests specific fixes

Tools: GitHub API, Read, Grep

Model: Claude Sonnet (cost-effective)

Data Pipeline Agent

Monitors data quality metrics
Detects schema changes
Auto-generates transformations
Runs validation queries
Alerts on anomalies

Tools: SQL, Python, Slack API

Model: Claude Haiku (low-latency)

Customer Support

Triages incoming tickets
Searches knowledge base
Drafts responses for review
Escalates complex issues
Learns from past resolutions

Tools: KB Search, CRM, Email

Model: Claude Sonnet + handoffs

Common Architecture

All three follow the same pattern: trigger event → gather context → reason about action → execute with tools → verify & report. The difference is in the tools available and the domain-specific instructions.

DEBUG

Debugging Agents

Agents are harder to debug than traditional software because behavior emerges from LLM reasoning. Observability is the key to understanding agent decisions.

Observability & Logging

from agents import Agent, Runner
from agents.tracing import TracingConfig

# Enable detailed tracing
config = TracingConfig(
    log_level="DEBUG",
    trace_tools=True,
    trace_reasoning=True,
    output_dir="./traces"
)

result = await Runner.run(
    agent, prompt,
    tracing=config
)

# Inspect the trace
for step in result.trace.steps:
    print(f"[{step.type}] {step.summary}")
    if step.type == "tool_call":
        print(f"  Tool: {step.tool_name}")
        print(f"  Input: {step.input}")
        print(f"  Output: {step.output[:200]}")
    elif step.type == "reasoning":
        print(f"  Thought: {step.text[:200]}")

What to Look For

Tool selection errors — wrong tool for the job
Input malformation — bad parameters to tools
Reasoning loops — agent repeating the same action
Context loss — forgetting earlier findings
Premature stopping — declaring done too early

Tracing Agent Decisions

Log every LLM call with full prompt
Record tool inputs & outputs
Track token usage per step
Measure latency per action
Compare traces between runs

Common Failure Modes

Infinite loops on error recovery
Hallucinating file paths or APIs
Misinterpreting tool output format
Overly broad tool calls (e.g., rm -rf)

PERFORMANCE

Agent Performance

Agent systems involve tradeoffs between latency, cost, and accuracy. Understanding these tradeoffs is critical for production deployments.

Latency Optimization

Use faster models for simple steps
Parallelize independent tool calls
Cache repeated tool results
Minimize context window size
Stream responses to users

Cost Management

Route to Haiku for triage tasks
Use Sonnet for most agent work
Reserve Opus for hard reasoning
Summarize context to reduce tokens
Set max iteration limits

Accuracy Tradeoffs

More iterations = higher accuracy
Bigger model = better reasoning
More context = better decisions
Verification steps add cost but quality
Self-consistency checks improve reliability

Strategy	Latency Impact	Cost Impact	Accuracy Impact
Model routing (Haiku/Sonnet/Opus)	-40% latency	-60% cost	-10% accuracy
Parallel tool execution	-50% latency	Neutral	Neutral
Context summarization	-20% latency	-30% cost	-5% accuracy
Verification loop	+30% latency	+25% cost	+15% accuracy
Extended thinking	+20% latency	+15% cost	+20% accuracy

ENTERPRISE

Enterprise Agent Deployment

Moving agents from prototype to production requires attention to scaling, monitoring, and compliance.

Scaling

Horizontal scaling with worker pools
Queue-based task distribution
Rate limiting per user/team
Graceful degradation under load
Auto-scaling based on queue depth

Monitoring

Dashboard for agent health metrics
Success/failure rate tracking
Token usage & cost attribution
Latency percentiles (p50, p95, p99)
Alerting on anomalous behavior

Compliance

Audit logs for all agent actions
Data residency requirements
PII detection & redaction
Role-based access controls
SOC 2 / HIPAA compliance paths

Infrastructure Pattern

# Agent deployment architecture
services:
  agent-gateway:
    image: agent-gateway:latest
    replicas: 3
    environment:
      - ANTHROPIC_API_KEY=${API_KEY}
      - MAX_CONCURRENT_AGENTS=50
  agent-worker:
    image: agent-worker:latest
    replicas: 10
    resources:
      limits: { memory: "2Gi", cpu: "1" }
  redis:
    image: redis:7-alpine  # Task queue

Production Checklist

☐ Rate limits & circuit breakers
☐ Timeout on all agent runs
☐ Fallback behavior on API errors
☐ Cost alerting per user/team
☐ Secrets management (no hardcoded keys)
☐ Sandbox all tool executions
☐ Human review for sensitive actions
☐ Retention policy for traces/logs

FUTURE

The Future of Agents

Agent capabilities are expanding rapidly. The next wave brings more autonomy, richer modalities, and collaborative agent ecosystems.

Autonomous Coding

End-to-end feature implementation
Agents that write & maintain tests
Self-reviewing pull requests
Automated bug triage & fixing
Continuous codebase improvement

Already emerging with Claude Code

Multi-Modal Agents

Vision: screenshot analysis, UI testing
Audio: voice-driven agent control
Video: monitoring & analysis
Combined modalities for richer understanding
Computer use & browser interaction

Claude's vision capabilities enable this today

Agent Swarms

Dozens of agents collaborating
Emergent problem-solving behavior
Specialized agent marketplaces
Cross-organization agent coordination
Self-organizing agent teams

Research frontier, early experiments underway

Key Trends to Watch

Longer autonomy windows — agents running for hours, not minutes. Better tool ecosystems — standardized tool interfaces (MCP, now under the Linux Foundation's Agentic AI Foundation). Agent-to-agent protocols — standardized communication. Formal verification — proving agent safety properties mathematically.

SUMMARY

Summary & Best Practices

Key Takeaways

Agents are LLMs + tools + loops
The observe-think-act cycle is fundamental
agents.md defines agent identity & behavior
The Claude Agent SDK enables production agents
Multi-agent systems unlock complex workflows
Safety guardrails are non-negotiable

Design Principles

Start simple — single agent first
Scope narrowly — clear tool boundaries
Fail gracefully — always have fallbacks
Log everything — observability first
Test iteratively — eval-driven development

Best Practices

Write clear, specific agent instructions
Use the right model tier for each task
Implement human-in-the-loop for high-stakes actions
Set iteration limits to prevent runaway costs
Version your agents.md alongside code
Monitor token usage and set budgets

Anti-patterns to Avoid

God agent — one agent that does everything
No guardrails — unrestricted tool access
Vague instructions — ambiguous agent prompts
No observability — can't debug what you can't see
Premature multi-agent — complexity without need

Next: 08 — Model Context Protocol (MCP)