PRESENTATION 05 OF 10

The Claude API & SDKs

Building with Anthropic's Developer Platform

REST API Python SDK TypeScript SDK Streaming

Anthropic Claude Series

API Overview

The Claude API is a REST API that provides programmatic access to Claude models for building AI-powered applications.

Chat & Generation

Build chatbots, content generators, code assistants

Analysis & Vision

Analyze text, images, documents, and data

Tool Use & Agents

Create autonomous agents with function calling

Authentication

All API requests require an API key passed via the x-api-key header.

Getting Your Key

Visit console.anthropic.com
Create an account or sign in
Navigate to API Keys section
Generate a new key (starts with sk-ant-)
Store securely — shown only once

Required Headers

x-api-key: YOUR_API_KEY
content-type: application/json
anthropic-version: 2023-06-01
Optional: anthropic-beta for beta features

# cURL example
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,
       "messages":[{"role":"user","content":"Hello, Claude!"}]}'

Never commit API keys to source control. Use environment variables or secret managers.

Messages API — The Core Endpoint

The Messages API is the primary interface for interacting with Claude. It uses a structured conversation format.

POST /v1/messages

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "system": "You are a helpful assistant.",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}

Key Features

Structured messages — explicit role/content pairs
System prompts — separate from conversation
Multi-turn — pass full conversation history
Streaming — real-time token delivery
Tool use — function calling built in
Vision — send images inline

Base URL: https://api.anthropic.com — API version: 2023-06-01

Request Anatomy

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a senior developer.",
    temperature=0.7,
    top_p=0.9,
    messages=[
        {
            "role": "user",
            "content": "Explain async/await."
        }
    ],
    metadata={"user_id": "user-123"},
    stop_sequences=["\n\nHuman:"]
)

Parameter	Required	Description
`model`	Yes	Model identifier
`max_tokens`	Yes	Max output tokens
`messages`	Yes	Conversation array
`system`	No	System prompt string
`temperature`	No	Randomness (0–1)
`top_p`	No	Nucleus sampling
`top_k`	No	Top-k sampling
`stop_sequences`	No	Custom stop strings
`stream`	No	Enable streaming
`metadata`	No	Request metadata

Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-20250514",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 25,
    "output_tokens": 12,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

Content Blocks

text — standard text response
tool_use — function call request
thinking — extended thinking output

Stop Reasons

end_turn — natural completion
max_tokens — hit token limit
stop_sequence — matched stop string
tool_use — wants to call a tool

Usage Tracking

input_tokens and output_tokens for billing. Includes cache hit/miss counts.

Python SDK

The official Python SDK provides a typed, ergonomic interface to the Claude API.

Installation

pip install anthropic

Basic Usage

import anthropic

# Uses ANTHROPIC_API_KEY env var
client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user",
         "content": "Write a haiku about Python."}
    ]
)

print(message.content[0].text)
# Indented with care,
# Whitespace speaks louder than words,
# Pythonic and clean.

Async Support

import anthropic
import asyncio

async def main():
    client = anthropic.AsyncAnthropic()

    message = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {"role": "user",
             "content": "Explain decorators."}
        ]
    )
    print(message.content[0].text)

asyncio.run(main())

SDK Features

Full type annotations & autocompletion
Automatic retries with backoff
Streaming helpers built in
Pydantic model responses

TypeScript SDK

The official TypeScript SDK provides full type safety and works in Node.js, Deno, and edge runtimes.

Installation

npm install @anthropic-ai/sdk

Basic Usage

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
// Uses ANTHROPIC_API_KEY env var

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Explain TypeScript generics."
    }
  ],
});

console.log(message.content[0].text);

With Explicit Key

const client = new Anthropic({
  apiKey: process.env.MY_CUSTOM_KEY,
  maxRetries: 3,
  timeout: 60_000, // 60 seconds
});

// Access response metadata
const msg = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 512,
  messages: [
    { role: "user", content: "Hello!" }
  ],
});

console.log(`Tokens used:
  In:  ${msg.usage.input_tokens}
  Out: ${msg.usage.output_tokens}`);

Runtime Compatibility

Works with Node.js 18+, Deno, Bun, Cloudflare Workers, and Vercel Edge.

Multi-turn Conversations

Claude is stateless — you must pass the full conversation history with each request. Roles must alternate between user and assistant.

conversation = []

def chat(user_input: str) -> str:
    conversation.append({
        "role": "user",
        "content": user_input
    })

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system="You are a helpful tutor.",
        messages=conversation
    )

    assistant_msg = response.content[0].text

    conversation.append({
        "role": "assistant",
        "content": assistant_msg
    })

    return assistant_msg

# Usage
chat("What is recursion?")
chat("Can you give me an example?")
chat("How does the call stack work?")

Role Alternation Rules

First message must be user
Roles must alternate
Cannot have two consecutive user or assistant messages
Last message should be user

Best Practices

Trim old messages to stay within context limits
Summarize earlier turns if conversation grows long
Track token count per turn for cost management
Consider sliding window approach

System Prompts

System prompts set context and behavior for Claude. They are separate from the conversation and apply to the entire interaction.

# Simple system prompt
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a senior Python developer. "
           "Respond with clean, well-documented "
           "code. Use type hints throughout.",
    messages=[
        {"role": "user",
         "content": "Write a binary search."}
    ]
)

# System prompt with cache_control
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert analyst...",
            "cache_control": {
                "type": "ephemeral"
            }
        }
    ],
    messages=[...]
)

Effective System Prompts

Define role and personality
Set output format expectations
Specify constraints and boundaries
Include domain knowledge or context
Define tone and communication style

Tips

Be specific — vague prompts give vague results
Use XML tags to structure complex instructions
Put reference material in system prompt for caching
System prompt tokens count toward input tokens
Can be a string or array of content blocks

Streaming

Streaming delivers tokens in real-time via Server-Sent Events (SSE), reducing perceived latency.

Python Streaming

# Using the stream manager
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user",
         "content": "Tell me a story."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get final message after stream
final = stream.get_final_message()
print(f"\nTokens: {final.usage}")

TypeScript Streaming

const stream = await client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    { role: "user",
      content: "Tell me a story." }
  ],
});

for await (const event of stream) {
  if (event.type === "content_block_delta"
    && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

const finalMsg = await stream.finalMessage();
console.log(finalMsg.usage);

SSE Event Types

message_start → content_block_start → content_block_delta (repeated) → content_block_stop → message_delta → message_stop

Tool Use (Function Calling)

Claude can call external functions you define. You describe tools with JSON Schema, and Claude decides when and how to use them.

Tool Definition

{
  "name": "get_weather",
  "description": "Get current weather for a city",
  "input_schema": {
    "type": "object",
    "properties": {
      "city": {"type":"string","description":"City name"},
      "units": {"type":"string","enum":["celsius","fahrenheit"]}
    },
    "required": ["city"]
  }
}

tool_choice Options

{"type": "auto"} — Claude decides (default)
{"type": "any"} — must use a tool
{"type": "tool", "name": "get_weather"} — use specific tool
{"type": "none"} — no tool use

Tool Use — Complete Example

import anthropic, json

client = anthropic.Anthropic()
tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
        },
        "required": ["city"]
    }
}]

def get_weather(city: str, units: str = "celsius") -> dict:
    return {"city": city, "temp": 22, "units": units, "condition": "sunny"}  # Mock

messages = [{"role": "user", "content": "What's the weather in Paris?"}]
response = client.messages.create(
    model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
)

if response.stop_reason == "tool_use":
    tool_block = next(b for b in response.content if b.type == "tool_use")
    result = get_weather(**tool_block.input)  # Execute the function

    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": [
        {"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)}
    ]})

    final = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
    )
    print(final.content[0].text)  # "The weather in Paris is currently sunny at 22°C."

Vision / Multimodal

Claude can analyze images alongside text. Supports base64 and URL image sources.

Base64 Image

import base64

with open("chart.png", "rb") as f:
    img_data = base64.standard_b64encode(
        f.read()
    ).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": img_data
                }
            },
            {
                "type": "text",
                "text": "Describe this chart."
            }
        ]
    }]
)

URL Image

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "url",
                    "url": "https://example.com/img.jpg"
                }
            },
            {
                "type": "text",
                "text": "What do you see?"
            }
        ]
    }]
)

Supported Formats

image/jpeg, image/png, image/gif, image/webp. Max ~20 images per request. Each image costs tokens based on dimensions.

Extended Thinking

Extended thinking lets Claude reason step-by-step before responding, producing higher-quality answers for complex tasks.

Enabling Extended Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "Prove that sqrt(2) "
                   "is irrational."
    }]
)

# Response contains thinking blocks
for block in response.content:
    if block.type == "thinking":
        print("THINKING:", block.thinking)
    elif block.type == "text":
        print("ANSWER:", block.text)

Key Parameters

budget_tokens — max tokens for thinking (min 1024)
max_tokens must be > budget_tokens
Thinking tokens count toward output billing
Cannot use temperature with thinking

Best For

Math proofs and complex reasoning
Multi-step analysis and planning
Code architecture decisions
Tasks requiring careful deliberation

Response Format

Content array includes thinking blocks (internal reasoning) followed by text blocks (final answer).

Batch API

The Message Batches API processes large volumes of requests asynchronously at 50% reduced cost.

Creating a Batch

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "req-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user",
                     "content": "Summarize AI safety."}
                ]
            }
        },
        {
            "custom_id": "req-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user",
                     "content": "Explain RLHF."}
                ]
            }
        }
    ]
)
print(batch.id)  # batch_abc123

Polling & Results

# Check status
batch = client.messages.batches.retrieve(
    batch.id
)
print(batch.processing_status)
# "in_progress" or "ended"

# Iterate results when complete
if batch.processing_status == "ended":
    for result in client.messages.batches \
            .results(batch.id):
        print(result.custom_id)
        if result.result.type == "succeeded":
            msg = result.result.message
            print(msg.content[0].text)

Batch Limits

Up to 100,000 requests per batch
Results within 24 hours
50% discount on token pricing
Results expire after 29 days

Token Counting

Understanding tokenization is essential for cost management and staying within context limits.

Count Tokens API

# Count tokens before sending
count = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user",
         "content": "Hello, how are you?"}
    ],
    system="You are helpful."
)
print(count.input_tokens)  # e.g., 18

# Check usage after response
response = client.messages.create(...)
print(f"Input:  {response.usage.input_tokens}")
print(f"Output: {response.usage.output_tokens}")
total_cost = (
    response.usage.input_tokens * 3 / 1_000_000
    + response.usage.output_tokens * 15 / 1_000_000
)
print(f"Cost: ${total_cost:.4f}")

Tokenization Rules

~1 token = ~4 characters of English
~75 words = ~100 tokens
Code typically uses more tokens per line
Non-English text may use more tokens

Context Windows

Model	Context
Claude Opus 4	200K tokens
Claude Sonnet 4	200K tokens
Claude Haiku 3.5	200K tokens

Cost Control Tips

Pre-count tokens to estimate costs
Use max_tokens to cap output
Trim conversation history proactively
Use prompt caching for repeated content

Error Handling

Robust error handling with retries and exponential backoff ensures reliable API integration.

Python Error Handling

import anthropic
from anthropic import (
    APIError,
    RateLimitError,
    APIConnectionError
)

client = anthropic.Anthropic(
    max_retries=3  # Auto-retries built in
)

try:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role":"user",
                   "content":"Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limited: {e.status_code}")
    # SDK auto-retries with backoff
except APIError as e:
    print(f"API error {e.status_code}: {e}")
except APIConnectionError:
    print("Network connectivity issue")

Common Error Codes

Code	Meaning
`400`	Invalid request (bad params)
`401`	Authentication failure
`403`	Permission denied
`404`	Resource not found
`429`	Rate limit exceeded
`500`	Internal server error
`529`	API overloaded

Retry Strategy

SDK handles retries for 429 and 5xx
Exponential backoff with jitter
Configure max_retries (default: 2)
Respect retry-after header

Prompt Caching

Prompt caching lets you reuse large prompt prefixes across requests, reducing latency and cost by up to 90%.

Enabling Caching

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": LARGE_REFERENCE_DOC,  # 10K tokens
            "cache_control": {
                "type": "ephemeral"
            }
        }
    ],
    messages=[
        {"role": "user",
         "content": "Summarize section 3."}
    ]
)

# Check cache performance
u = response.usage
print(f"Cache write: {u.cache_creation_input_tokens}")
print(f"Cache read:  {u.cache_read_input_tokens}")
print(f"Uncached:    {u.input_tokens}")

How It Works

Add cache_control to content blocks
First request writes to cache (small premium)
Subsequent requests read from cache (90% cheaper)
Cache TTL: 5 minutes (resets on use)

Cacheable Locations

System prompt blocks
User message content blocks
Tool definitions
Minimum 1024 tokens to cache

Cost Breakdown

Cache write: 1.25x base price. Cache read: 0.1x base price. Net savings on repeated prompts: up to 90%.

Embeddings & Text Classification

While Claude does not have a dedicated embeddings endpoint, it excels at text classification and structured extraction tasks.

Classification with Claude

def classify_text(text: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=100,
        system="""Classify the given text into
exactly one category: positive, negative,
or neutral. Respond with JSON only:
{"sentiment": "...", "confidence": 0.0}""",
        messages=[
            {"role": "user", "content": text}
        ]
    )
    return json.loads(
        response.content[0].text
    )

result = classify_text(
    "This product exceeded my expectations!"
)
# {"sentiment": "positive",
#  "confidence": 0.95}

Multi-label Classification

def tag_support_ticket(ticket: str):
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=200,
        system="""Tag the support ticket.
Return JSON: {"priority": "low|med|high",
"category": "billing|technical|account",
"requires_human": true|false}""",
        messages=[
            {"role": "user",
             "content": ticket}
        ]
    )
    return json.loads(
        response.content[0].text
    )

For Embeddings

Use dedicated embedding models (e.g., Voyage AI via voyageai package) for vector search, RAG, and similarity tasks. Claude pairs well with these for the generation step.

API Best Practices

Security

Store API keys in env vars or secret managers
Never expose keys in client-side code
Use a backend proxy for browser apps
Rotate keys periodically
Set per-key spending limits in Console

Cost Optimization

Use Haiku for simple tasks
Enable prompt caching for repeated context
Set appropriate max_tokens
Use Batch API for offline jobs
Trim conversation history
Pre-count tokens to estimate cost

Prompt Engineering

Be specific and explicit
Use XML tags for structure
Provide examples (few-shot)
Put long content in system prompt
Request structured output (JSON)
Use temperature=0 for determinism

Reliability

Use SDK auto-retries (set max_retries)
Implement request timeouts
Handle all error codes gracefully
Log requests and responses for debugging
Monitor usage via Anthropic Console

Model Selection Guide

Opus — hardest tasks, max quality
Sonnet — balanced performance/cost
Haiku — fast, cheap, simple tasks
Start with Sonnet, upgrade/downgrade as needed
Test with evals before switching models

Rate Limits & Pricing

Rate limits scale with your usage tier. Pricing is per-token, varying by model.

Pricing (per million tokens)

Model	Input	Output
Claude Opus 4	$15	$75
Claude Sonnet 4	$3	$15
Claude Haiku 3.5	$0.80	$4

Batch API: 50% discount. Prompt caching read: 90% discount.

Usage Tiers

Tier	Spend	RPM
Tier 1 (Free)	$0	5
Tier 1	$5+ credit	50
Tier 2	$40+ spent	1,000
Tier 3	$200+ spent	2,000
Tier 4	$400+ spent	4,000
Scale	Custom	Custom

Rate Limit Headers

Responses include x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, and x-ratelimit-reset headers for proactive management.

Summary & Resources

Core Concepts

REST API with Messages endpoint
API key authentication
Stateless — pass full history
Content blocks for rich responses
Token-based pricing

Key Features

Streaming via SSE
Tool use / function calling
Vision & multimodal input
Extended thinking
Batch processing
Prompt caching

SDKs & Tools

Python: pip install anthropic
TypeScript: npm i @anthropic-ai/sdk
Full async support in both
Auto-retries & type safety
Anthropic Console for monitoring

Documentation

docs.anthropic.com — Full API reference
console.anthropic.com — Dashboard & keys
github.com/anthropics — SDK source code

Next: Presentation 6

Claude in the Enterprise — deployment patterns, Amazon Bedrock, Google Vertex AI, and production architectures.