PRESENTATION 05 OF 10
Building with Anthropic's Developer Platform
Anthropic Claude Series
The Claude API is a REST API that provides programmatic access to Claude models for building AI-powered applications.
Build chatbots, content generators, code assistants
Analyze text, images, documents, and data
Create autonomous agents with function calling
All API requests require an API key passed via the x-api-key header.
sk-ant-)x-api-key: YOUR_API_KEYcontent-type: application/jsonanthropic-version: 2023-06-01anthropic-beta for beta features# cURL example
curl https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,
"messages":[{"role":"user","content":"Hello, Claude!"}]}'
Never commit API keys to source control. Use environment variables or secret managers.
The Messages API is the primary interface for interacting with Claude. It uses a structured conversation format.
POST /v1/messages
{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"system": "You are a helpful assistant.",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}
Base URL: https://api.anthropic.com — API version: 2023-06-01
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a senior developer.",
temperature=0.7,
top_p=0.9,
messages=[
{
"role": "user",
"content": "Explain async/await."
}
],
metadata={"user_id": "user-123"},
stop_sequences=["\n\nHuman:"]
)
| Parameter | Required | Description |
|---|---|---|
model | Yes | Model identifier |
max_tokens | Yes | Max output tokens |
messages | Yes | Conversation array |
system | No | System prompt string |
temperature | No | Randomness (0–1) |
top_p | No | Nucleus sampling |
top_k | No | Top-k sampling |
stop_sequences | No | Custom stop strings |
stream | No | Enable streaming |
metadata | No | Request metadata |
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-20250514",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 25,
"output_tokens": 12,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}
text — standard text responsetool_use — function call requestthinking — extended thinking outputend_turn — natural completionmax_tokens — hit token limitstop_sequence — matched stop stringtool_use — wants to call a toolinput_tokens and output_tokens for billing. Includes cache hit/miss counts.
The official Python SDK provides a typed, ergonomic interface to the Claude API.
pip install anthropic
import anthropic
# Uses ANTHROPIC_API_KEY env var
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user",
"content": "Write a haiku about Python."}
]
)
print(message.content[0].text)
# Indented with care,
# Whitespace speaks louder than words,
# Pythonic and clean.
import anthropic
import asyncio
async def main():
client = anthropic.AsyncAnthropic()
message = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user",
"content": "Explain decorators."}
]
)
print(message.content[0].text)
asyncio.run(main())
The official TypeScript SDK provides full type safety and works in Node.js, Deno, and edge runtimes.
npm install @anthropic-ai/sdk
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Uses ANTHROPIC_API_KEY env var
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Explain TypeScript generics."
}
],
});
console.log(message.content[0].text);
const client = new Anthropic({
apiKey: process.env.MY_CUSTOM_KEY,
maxRetries: 3,
timeout: 60_000, // 60 seconds
});
// Access response metadata
const msg = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 512,
messages: [
{ role: "user", content: "Hello!" }
],
});
console.log(`Tokens used:
In: ${msg.usage.input_tokens}
Out: ${msg.usage.output_tokens}`);
Works with Node.js 18+, Deno, Bun, Cloudflare Workers, and Vercel Edge.
Claude is stateless — you must pass the full conversation history with each request. Roles must alternate between user and assistant.
conversation = []
def chat(user_input: str) -> str:
conversation.append({
"role": "user",
"content": user_input
})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful tutor.",
messages=conversation
)
assistant_msg = response.content[0].text
conversation.append({
"role": "assistant",
"content": assistant_msg
})
return assistant_msg
# Usage
chat("What is recursion?")
chat("Can you give me an example?")
chat("How does the call stack work?")
useruser or assistant messagesuserSystem prompts set context and behavior for Claude. They are separate from the conversation and apply to the entire interaction.
# Simple system prompt
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a senior Python developer. "
"Respond with clean, well-documented "
"code. Use type hints throughout.",
messages=[
{"role": "user",
"content": "Write a binary search."}
]
)
# System prompt with cache_control
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an expert analyst...",
"cache_control": {
"type": "ephemeral"
}
}
],
messages=[...]
)
Streaming delivers tokens in real-time via Server-Sent Events (SSE), reducing perceived latency.
# Using the stream manager
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user",
"content": "Tell me a story."}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# Get final message after stream
final = stream.get_final_message()
print(f"\nTokens: {final.usage}")
const stream = await client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{ role: "user",
content: "Tell me a story." }
],
});
for await (const event of stream) {
if (event.type === "content_block_delta"
&& event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}
const finalMsg = await stream.finalMessage();
console.log(finalMsg.usage);
message_start → content_block_start → content_block_delta (repeated) → content_block_stop → message_delta → message_stop
Claude can call external functions you define. You describe tools with JSON Schema, and Claude decides when and how to use them.
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type":"string","description":"City name"},
"units": {"type":"string","enum":["celsius","fahrenheit"]}
},
"required": ["city"]
}
}
{"type": "auto"} — Claude decides (default){"type": "any"} — must use a tool{"type": "tool", "name": "get_weather"} — use specific tool{"type": "none"} — no tool useimport anthropic, json
client = anthropic.Anthropic()
tools = [{
"name": "get_weather",
"description": "Get current weather for a location.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
},
"required": ["city"]
}
}]
def get_weather(city: str, units: str = "celsius") -> dict:
return {"city": city, "temp": 22, "units": units, "condition": "sunny"} # Mock
messages = [{"role": "user", "content": "What's the weather in Paris?"}]
response = client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
)
if response.stop_reason == "tool_use":
tool_block = next(b for b in response.content if b.type == "tool_use")
result = get_weather(**tool_block.input) # Execute the function
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": [
{"type": "tool_result", "tool_use_id": tool_block.id, "content": json.dumps(result)}
]})
final = client.messages.create(
model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=messages
)
print(final.content[0].text) # "The weather in Paris is currently sunny at 22°C."
Claude can analyze images alongside text. Supports base64 and URL image sources.
import base64
with open("chart.png", "rb") as f:
img_data = base64.standard_b64encode(
f.read()
).decode("utf-8")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": img_data
}
},
{
"type": "text",
"text": "Describe this chart."
}
]
}]
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/img.jpg"
}
},
{
"type": "text",
"text": "What do you see?"
}
]
}]
)
image/jpeg, image/png, image/gif, image/webp. Max ~20 images per request. Each image costs tokens based on dimensions.
Extended thinking lets Claude reason step-by-step before responding, producing higher-quality answers for complex tasks.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{
"role": "user",
"content": "Prove that sqrt(2) "
"is irrational."
}]
)
# Response contains thinking blocks
for block in response.content:
if block.type == "thinking":
print("THINKING:", block.thinking)
elif block.type == "text":
print("ANSWER:", block.text)
budget_tokens — max tokens for thinking (min 1024)max_tokens must be > budget_tokenstemperature with thinkingContent array includes thinking blocks (internal reasoning) followed by text blocks (final answer).
The Message Batches API processes large volumes of requests asynchronously at 50% reduced cost.
batch = client.messages.batches.create(
requests=[
{
"custom_id": "req-1",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user",
"content": "Summarize AI safety."}
]
}
},
{
"custom_id": "req-2",
"params": {
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user",
"content": "Explain RLHF."}
]
}
}
]
)
print(batch.id) # batch_abc123
# Check status
batch = client.messages.batches.retrieve(
batch.id
)
print(batch.processing_status)
# "in_progress" or "ended"
# Iterate results when complete
if batch.processing_status == "ended":
for result in client.messages.batches \
.results(batch.id):
print(result.custom_id)
if result.result.type == "succeeded":
msg = result.result.message
print(msg.content[0].text)
Understanding tokenization is essential for cost management and staying within context limits.
# Count tokens before sending
count = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[
{"role": "user",
"content": "Hello, how are you?"}
],
system="You are helpful."
)
print(count.input_tokens) # e.g., 18
# Check usage after response
response = client.messages.create(...)
print(f"Input: {response.usage.input_tokens}")
print(f"Output: {response.usage.output_tokens}")
total_cost = (
response.usage.input_tokens * 3 / 1_000_000
+ response.usage.output_tokens * 15 / 1_000_000
)
print(f"Cost: ${total_cost:.4f}")
| Model | Context |
|---|---|
| Claude Opus 4 | 200K tokens |
| Claude Sonnet 4 | 200K tokens |
| Claude Haiku 3.5 | 200K tokens |
max_tokens to cap outputRobust error handling with retries and exponential backoff ensures reliable API integration.
import anthropic
from anthropic import (
APIError,
RateLimitError,
APIConnectionError
)
client = anthropic.Anthropic(
max_retries=3 # Auto-retries built in
)
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role":"user",
"content":"Hello"}]
)
except RateLimitError as e:
print(f"Rate limited: {e.status_code}")
# SDK auto-retries with backoff
except APIError as e:
print(f"API error {e.status_code}: {e}")
except APIConnectionError:
print("Network connectivity issue")
| Code | Meaning |
|---|---|
400 | Invalid request (bad params) |
401 | Authentication failure |
403 | Permission denied |
404 | Resource not found |
429 | Rate limit exceeded |
500 | Internal server error |
529 | API overloaded |
429 and 5xxmax_retries (default: 2)retry-after headerPrompt caching lets you reuse large prompt prefixes across requests, reducing latency and cost by up to 90%.
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=[
{
"type": "text",
"text": LARGE_REFERENCE_DOC, # 10K tokens
"cache_control": {
"type": "ephemeral"
}
}
],
messages=[
{"role": "user",
"content": "Summarize section 3."}
]
)
# Check cache performance
u = response.usage
print(f"Cache write: {u.cache_creation_input_tokens}")
print(f"Cache read: {u.cache_read_input_tokens}")
print(f"Uncached: {u.input_tokens}")
cache_control to content blocksCache write: 1.25x base price. Cache read: 0.1x base price. Net savings on repeated prompts: up to 90%.
While Claude does not have a dedicated embeddings endpoint, it excels at text classification and structured extraction tasks.
def classify_text(text: str) -> dict:
response = client.messages.create(
model="claude-haiku-3-5-20241022",
max_tokens=100,
system="""Classify the given text into
exactly one category: positive, negative,
or neutral. Respond with JSON only:
{"sentiment": "...", "confidence": 0.0}""",
messages=[
{"role": "user", "content": text}
]
)
return json.loads(
response.content[0].text
)
result = classify_text(
"This product exceeded my expectations!"
)
# {"sentiment": "positive",
# "confidence": 0.95}
def tag_support_ticket(ticket: str):
response = client.messages.create(
model="claude-haiku-3-5-20241022",
max_tokens=200,
system="""Tag the support ticket.
Return JSON: {"priority": "low|med|high",
"category": "billing|technical|account",
"requires_human": true|false}""",
messages=[
{"role": "user",
"content": ticket}
]
)
return json.loads(
response.content[0].text
)
Use dedicated embedding models (e.g., Voyage AI via voyageai package) for vector search, RAG, and similarity tasks. Claude pairs well with these for the generation step.
max_tokenstemperature=0 for determinismmax_retries)Rate limits scale with your usage tier. Pricing is per-token, varying by model.
| Model | Input | Output |
|---|---|---|
| Claude Opus 4 | $15 | $75 |
| Claude Sonnet 4 | $3 | $15 |
| Claude Haiku 3.5 | $0.80 | $4 |
Batch API: 50% discount. Prompt caching read: 90% discount.
| Tier | Spend | RPM |
|---|---|---|
| Tier 1 (Free) | $0 | 5 |
| Tier 1 | $5+ credit | 50 |
| Tier 2 | $40+ spent | 1,000 |
| Tier 3 | $200+ spent | 2,000 |
| Tier 4 | $400+ spent | 4,000 |
| Scale | Custom | Custom |
Responses include x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, and x-ratelimit-reset headers for proactive management.
pip install anthropicnpm i @anthropic-ai/sdkClaude in the Enterprise — deployment patterns, Amazon Bedrock, Google Vertex AI, and production architectures.