PRESENTATION 05 OF 10

The Claude API & SDKs

Building with Anthropic's Developer Platform

REST API Python SDK TypeScript SDK Streaming

Anthropic Claude Series

01

API Overview

The Claude API is a REST API that provides programmatic access to Claude models for building AI-powered applications.

Your Application Python / TS / cURL HTTPS Anthropic API api.anthropic.com Messages Batches Claude Models Opus / Sonnet / Haiku + Vision + Tools + Extended Thinking

Chat & Generation

Build chatbots, content generators, code assistants

Analysis & Vision

Analyze text, images, documents, and data

Tool Use & Agents

Create autonomous agents with function calling

02

Authentication

All API requests require an API key passed via the x-api-key header.

Getting Your Key

  • Visit console.anthropic.com
  • Create an account or sign in
  • Navigate to API Keys section
  • Generate a new key (starts with sk-ant-)
  • Store securely — shown only once

Required Headers

  • x-api-key: YOUR_API_KEY
  • content-type: application/json
  • anthropic-version: 2023-06-01
  • Optional: anthropic-beta for beta features
# cURL example
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,
       "messages":[{"role":"user","content":"Hello, Claude!"}]}'

Never commit API keys to source control. Use environment variables or secret managers.

03

Messages API — The Core Endpoint

The Messages API is the primary interface for interacting with Claude. It uses a structured conversation format.

POST /v1/messages

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "system": "You are a helpful assistant.",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of France?"
    }
  ]
}

Key Features

  • Structured messages — explicit role/content pairs
  • System prompts — separate from conversation
  • Multi-turn — pass full conversation history
  • Streaming — real-time token delivery
  • Tool use — function calling built in
  • Vision — send images inline

Base URL: https://api.anthropic.com — API version: 2023-06-01

04

Request Anatomy

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a senior developer.",
    temperature=0.7,
    top_p=0.9,
    messages=[
        {
            "role": "user",
            "content": "Explain async/await."
        }
    ],
    metadata={"user_id": "user-123"},
    stop_sequences=["\n\nHuman:"]
)
ParameterRequiredDescription
modelYesModel identifier
max_tokensYesMax output tokens
messagesYesConversation array
systemNoSystem prompt string
temperatureNoRandomness (0–1)
top_pNoNucleus sampling
top_kNoTop-k sampling
stop_sequencesNoCustom stop strings
streamNoEnable streaming
metadataNoRequest metadata
05

Response Structure

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-20250514",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 25,
    "output_tokens": 12,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}

Content Blocks

  • text — standard text response
  • tool_use — function call request
  • thinking — extended thinking output

Stop Reasons

  • end_turn — natural completion
  • max_tokens — hit token limit
  • stop_sequence — matched stop string
  • tool_use — wants to call a tool

Usage Tracking

input_tokens and output_tokens for billing. Includes cache hit/miss counts.

06

Python SDK

The official Python SDK provides a typed, ergonomic interface to the Claude API.

Installation

pip install anthropic

Basic Usage

import anthropic

# Uses ANTHROPIC_API_KEY env var
client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user",
         "content": "Write a haiku about Python."}
    ]
)

print(message.content[0].text)
# Indented with care,
# Whitespace speaks louder than words,
# Pythonic and clean.

Async Support

import anthropic
import asyncio

async def main():
    client = anthropic.AsyncAnthropic()

    message = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {"role": "user",
             "content": "Explain decorators."}
        ]
    )
    print(message.content[0].text)

asyncio.run(main())

SDK Features

  • Full type annotations & autocompletion
  • Automatic retries with backoff
  • Streaming helpers built in
  • Pydantic model responses
07

TypeScript SDK

The official TypeScript SDK provides full type safety and works in Node.js, Deno, and edge runtimes.

Installation

npm install @anthropic-ai/sdk

Basic Usage

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
// Uses ANTHROPIC_API_KEY env var

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Explain TypeScript generics."
    }
  ],
});

console.log(message.content[0].text);

With Explicit Key

const client = new Anthropic({
  apiKey: process.env.MY_CUSTOM_KEY,
  maxRetries: 3,
  timeout: 60_000, // 60 seconds
});

// Access response metadata
const msg = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 512,
  messages: [
    { role: "user", content: "Hello!" }
  ],
});

console.log(`Tokens used:
  In:  ${msg.usage.input_tokens}
  Out: ${msg.usage.output_tokens}`);

Runtime Compatibility

Works with Node.js 18+, Deno, Bun, Cloudflare Workers, and Vercel Edge.

08

Multi-turn Conversations

Claude is stateless — you must pass the full conversation history with each request. Roles must alternate between user and assistant.

conversation = []

def chat(user_input: str) -> str:
    conversation.append({
        "role": "user",
        "content": user_input
    })

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system="You are a helpful tutor.",
        messages=conversation
    )

    assistant_msg = response.content[0].text

    conversation.append({
        "role": "assistant",
        "content": assistant_msg
    })

    return assistant_msg

# Usage
chat("What is recursion?")
chat("Can you give me an example?")
chat("How does the call stack work?")

Role Alternation Rules

  • First message must be user
  • Roles must alternate
  • Cannot have two consecutive user or assistant messages
  • Last message should be user

Best Practices

  • Trim old messages to stay within context limits
  • Summarize earlier turns if conversation grows long
  • Track token count per turn for cost management
  • Consider sliding window approach
09

System Prompts

System prompts set context and behavior for Claude. They are separate from the conversation and apply to the entire interaction.

# Simple system prompt
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a senior Python developer. "
           "Respond with clean, well-documented "
           "code. Use type hints throughout.",
    messages=[
        {"role": "user",
         "content": "Write a binary search."}
    ]
)

# System prompt with cache_control
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert analyst...",
            "cache_control": {
                "type": "ephemeral"
            }
        }
    ],
    messages=[...]
)

Effective System Prompts

  • Define role and personality
  • Set output format expectations
  • Specify constraints and boundaries
  • Include domain knowledge or context
  • Define tone and communication style

Tips

  • Be specific — vague prompts give vague results
  • Use XML tags to structure complex instructions
  • Put reference material in system prompt for caching
  • System prompt tokens count toward input tokens
  • Can be a string or array of content blocks
10

Streaming

Streaming delivers tokens in real-time via Server-Sent Events (SSE), reducing perceived latency.

Python Streaming

# Using the stream manager
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user",
         "content": "Tell me a story."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get final message after stream
final = stream.get_final_message()
print(f"\nTokens: {final.usage}")

TypeScript Streaming

const stream = await client.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    { role: "user",
      content: "Tell me a story." }
  ],
});

for await (const event of stream) {
  if (event.type === "content_block_delta"
    && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

const finalMsg = await stream.finalMessage();
console.log(finalMsg.usage);

SSE Event Types

message_startcontent_block_startcontent_block_delta (repeated) → content_block_stopmessage_deltamessage_stop

11

Tool Use (Function Calling)

Claude can call external functions you define. You describe tools with JSON Schema, and Claude decides when and how to use them.

User Request "What's weather?" Claude stop: tool_use get_weather({...}) execute Your Function tool_result return Claude stop: end_turn "It's 72°F..."

Tool Definition

{
  "name": "get_weather",
  "description": "Get current weather for a city",
  "input_schema": {
    "type": "object",
    "properties": {
      "city": {"type":"string","description":"City name"},
      "units": {"type":"string","enum":["celsius","fahrenheit"]}
    },
    "required": ["city"]
  }
}

tool_choice Options

  • {"type": "auto"} — Claude decides (default)
  • {"type": "any"} — must use a tool
  • {"type": "tool", "name": "get_weather"} — use specific tool
  • {"type": "none"} — no tool use
12

Tool Use — Complete Example

import anthropic, json

client = anthropic.Anthropic()
tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
        },
        "required": ["city"]
    }
}]

def get_weather(city: str, units: str = "celsius") -> dict:
    return {"city": city, "temp": 22, "units": units, "condition": "sunny"}  # Mock

messages = [{"role": "user", "content": "What's the weather in Paris?"}]
response = client.messages.create(
    model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
)

if response.stop_reason == "tool_use":
    tool_block = next(b for b in response.content if b.type == "tool_use")
    result = get_weather(**tool_block.input)  # Execute the function

    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": [
        {"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)}
    ]})

    final = client.messages.create(
        model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
    )
    print(final.content[0].text)  # "The weather in Paris is currently sunny at 22°C."
13

Vision / Multimodal

Claude can analyze images alongside text. Supports base64 and URL image sources.

Base64 Image

import base64

with open("chart.png", "rb") as f:
    img_data = base64.standard_b64encode(
        f.read()
    ).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": img_data
                }
            },
            {
                "type": "text",
                "text": "Describe this chart."
            }
        ]
    }]
)

URL Image

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "url",
                    "url": "https://example.com/img.jpg"
                }
            },
            {
                "type": "text",
                "text": "What do you see?"
            }
        ]
    }]
)

Supported Formats

image/jpeg, image/png, image/gif, image/webp. Max ~20 images per request. Each image costs tokens based on dimensions.

14

Extended Thinking

Extended thinking lets Claude reason step-by-step before responding, producing higher-quality answers for complex tasks.

Enabling Extended Thinking

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "Prove that sqrt(2) "
                   "is irrational."
    }]
)

# Response contains thinking blocks
for block in response.content:
    if block.type == "thinking":
        print("THINKING:", block.thinking)
    elif block.type == "text":
        print("ANSWER:", block.text)

Key Parameters

  • budget_tokens — max tokens for thinking (min 1024)
  • max_tokens must be > budget_tokens
  • Thinking tokens count toward output billing
  • Cannot use temperature with thinking

Best For

  • Math proofs and complex reasoning
  • Multi-step analysis and planning
  • Code architecture decisions
  • Tasks requiring careful deliberation

Response Format

Content array includes thinking blocks (internal reasoning) followed by text blocks (final answer).

15

Batch API

The Message Batches API processes large volumes of requests asynchronously at 50% reduced cost.

Creating a Batch

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "req-1",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user",
                     "content": "Summarize AI safety."}
                ]
            }
        },
        {
            "custom_id": "req-2",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user",
                     "content": "Explain RLHF."}
                ]
            }
        }
    ]
)
print(batch.id)  # batch_abc123

Polling & Results

# Check status
batch = client.messages.batches.retrieve(
    batch.id
)
print(batch.processing_status)
# "in_progress" or "ended"

# Iterate results when complete
if batch.processing_status == "ended":
    for result in client.messages.batches \
            .results(batch.id):
        print(result.custom_id)
        if result.result.type == "succeeded":
            msg = result.result.message
            print(msg.content[0].text)

Batch Limits

  • Up to 100,000 requests per batch
  • Results within 24 hours
  • 50% discount on token pricing
  • Results expire after 29 days
16

Token Counting

Understanding tokenization is essential for cost management and staying within context limits.

Count Tokens API

# Count tokens before sending
count = client.messages.count_tokens(
    model="claude-sonnet-4-20250514",
    messages=[
        {"role": "user",
         "content": "Hello, how are you?"}
    ],
    system="You are helpful."
)
print(count.input_tokens)  # e.g., 18

# Check usage after response
response = client.messages.create(...)
print(f"Input:  {response.usage.input_tokens}")
print(f"Output: {response.usage.output_tokens}")
total_cost = (
    response.usage.input_tokens * 3 / 1_000_000
    + response.usage.output_tokens * 15 / 1_000_000
)
print(f"Cost: ${total_cost:.4f}")

Tokenization Rules

  • ~1 token = ~4 characters of English
  • ~75 words = ~100 tokens
  • Code typically uses more tokens per line
  • Non-English text may use more tokens

Context Windows

ModelContext
Claude Opus 4200K tokens
Claude Sonnet 4200K tokens
Claude Haiku 3.5200K tokens

Cost Control Tips

  • Pre-count tokens to estimate costs
  • Use max_tokens to cap output
  • Trim conversation history proactively
  • Use prompt caching for repeated content
17

Error Handling

Robust error handling with retries and exponential backoff ensures reliable API integration.

Python Error Handling

import anthropic
from anthropic import (
    APIError,
    RateLimitError,
    APIConnectionError
)

client = anthropic.Anthropic(
    max_retries=3  # Auto-retries built in
)

try:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role":"user",
                   "content":"Hello"}]
    )
except RateLimitError as e:
    print(f"Rate limited: {e.status_code}")
    # SDK auto-retries with backoff
except APIError as e:
    print(f"API error {e.status_code}: {e}")
except APIConnectionError:
    print("Network connectivity issue")

Common Error Codes

CodeMeaning
400Invalid request (bad params)
401Authentication failure
403Permission denied
404Resource not found
429Rate limit exceeded
500Internal server error
529API overloaded

Retry Strategy

  • SDK handles retries for 429 and 5xx
  • Exponential backoff with jitter
  • Configure max_retries (default: 2)
  • Respect retry-after header
18

Prompt Caching

Prompt caching lets you reuse large prompt prefixes across requests, reducing latency and cost by up to 90%.

Enabling Caching

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": LARGE_REFERENCE_DOC,  # 10K tokens
            "cache_control": {
                "type": "ephemeral"
            }
        }
    ],
    messages=[
        {"role": "user",
         "content": "Summarize section 3."}
    ]
)

# Check cache performance
u = response.usage
print(f"Cache write: {u.cache_creation_input_tokens}")
print(f"Cache read:  {u.cache_read_input_tokens}")
print(f"Uncached:    {u.input_tokens}")

How It Works

  • Add cache_control to content blocks
  • First request writes to cache (small premium)
  • Subsequent requests read from cache (90% cheaper)
  • Cache TTL: 5 minutes (resets on use)

Cacheable Locations

  • System prompt blocks
  • User message content blocks
  • Tool definitions
  • Minimum 1024 tokens to cache

Cost Breakdown

Cache write: 1.25x base price. Cache read: 0.1x base price. Net savings on repeated prompts: up to 90%.

19

Embeddings & Text Classification

While Claude does not have a dedicated embeddings endpoint, it excels at text classification and structured extraction tasks.

Classification with Claude

def classify_text(text: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=100,
        system="""Classify the given text into
exactly one category: positive, negative,
or neutral. Respond with JSON only:
{"sentiment": "...", "confidence": 0.0}""",
        messages=[
            {"role": "user", "content": text}
        ]
    )
    return json.loads(
        response.content[0].text
    )

result = classify_text(
    "This product exceeded my expectations!"
)
# {"sentiment": "positive",
#  "confidence": 0.95}

Multi-label Classification

def tag_support_ticket(ticket: str):
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=200,
        system="""Tag the support ticket.
Return JSON: {"priority": "low|med|high",
"category": "billing|technical|account",
"requires_human": true|false}""",
        messages=[
            {"role": "user",
             "content": ticket}
        ]
    )
    return json.loads(
        response.content[0].text
    )

For Embeddings

Use dedicated embedding models (e.g., Voyage AI via voyageai package) for vector search, RAG, and similarity tasks. Claude pairs well with these for the generation step.

20

API Best Practices

Security

  • Store API keys in env vars or secret managers
  • Never expose keys in client-side code
  • Use a backend proxy for browser apps
  • Rotate keys periodically
  • Set per-key spending limits in Console

Cost Optimization

  • Use Haiku for simple tasks
  • Enable prompt caching for repeated context
  • Set appropriate max_tokens
  • Use Batch API for offline jobs
  • Trim conversation history
  • Pre-count tokens to estimate cost

Prompt Engineering

  • Be specific and explicit
  • Use XML tags for structure
  • Provide examples (few-shot)
  • Put long content in system prompt
  • Request structured output (JSON)
  • Use temperature=0 for determinism

Reliability

  • Use SDK auto-retries (set max_retries)
  • Implement request timeouts
  • Handle all error codes gracefully
  • Log requests and responses for debugging
  • Monitor usage via Anthropic Console

Model Selection Guide

  • Opus — hardest tasks, max quality
  • Sonnet — balanced performance/cost
  • Haiku — fast, cheap, simple tasks
  • Start with Sonnet, upgrade/downgrade as needed
  • Test with evals before switching models
21

Rate Limits & Pricing

Rate limits scale with your usage tier. Pricing is per-token, varying by model.

Pricing (per million tokens)

ModelInputOutput
Claude Opus 4$15$75
Claude Sonnet 4$3$15
Claude Haiku 3.5$0.80$4

Batch API: 50% discount. Prompt caching read: 90% discount.

Usage Tiers

TierSpendRPM
Tier 1 (Free)$05
Tier 1$5+ credit50
Tier 2$40+ spent1,000
Tier 3$200+ spent2,000
Tier 4$400+ spent4,000
ScaleCustomCustom

Rate Limit Headers

Responses include x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, and x-ratelimit-reset headers for proactive management.

22

Summary & Resources

Core Concepts

  • REST API with Messages endpoint
  • API key authentication
  • Stateless — pass full history
  • Content blocks for rich responses
  • Token-based pricing

Key Features

  • Streaming via SSE
  • Tool use / function calling
  • Vision & multimodal input
  • Extended thinking
  • Batch processing
  • Prompt caching

SDKs & Tools

  • Python: pip install anthropic
  • TypeScript: npm i @anthropic-ai/sdk
  • Full async support in both
  • Auto-retries & type safety
  • Anthropic Console for monitoring

Documentation

  • docs.anthropic.com — Full API reference
  • console.anthropic.com — Dashboard & keys
  • github.com/anthropics — SDK source code

Next: Presentation 6

Claude in the Enterprise — deployment patterns, Amazon Bedrock, Google Vertex AI, and production architectures.