Skip to main content
Ollama provides compatibility with the Anthropic Messages API to help you use Ollama with applications designed for Claude, including Claude Code and other Anthropic-compatible tools.

Quick Start

Configure your Anthropic client to point to Ollama:
import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama'  # required but ignored
)

message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    messages=[{
        'role': 'user',
        'content': 'Explain quantum entanglement'
    }]
)
print(message.content[0].text)

Supported Endpoints

/v1/messages

Send messages to models and receive responses in Anthropic format.
Supported Features
  • ✅ Multi-turn conversations
  • ✅ System prompts
  • ✅ Streaming responses
  • ✅ Vision (images)
  • ✅ Tool/function calling
  • ✅ Thinking/extended thinking
  • ✅ Tool results

Request Parameters

ParameterTypeDescriptionSupport
modelstringModel name
max_tokensintegerMaximum tokens to generate
messagesarrayConversation messages
systemstring/arraySystem prompt
temperaturenumberSampling temperature (0-1)
top_pnumberNucleus sampling
top_kintegerTop-k sampling
streambooleanEnable streaming
stop_sequencesarrayStop sequences
toolsarrayAvailable tools
thinkingobjectExtended thinking config
tool_choiceobjectForce specific tool
metadataobjectRequest metadata

Response Format

{
  "id": "msg_123",
  "type": "message",
  "role": "assistant",
  "model": "llama3.2",
  "content": [
    {"type": "text", "text": "Quantum entanglement is..."}
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 150
  }
}
1
Basic Message
2
message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    messages=[{
        'role': 'user',
        'content': 'Write a haiku about programming'
    }]
)
print(message.content[0].text)
3
System Prompts
4
message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    system='You are an expert Python developer.',
    messages=[{
        'role': 'user',
        'content': 'How do I read a file in Python?'
    }]
)
5
Multi-turn Conversations
6
message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    messages=[
        {'role': 'user', 'content': 'What is 2+2?'},
        {'role': 'assistant', 'content': '2+2 equals 4.'},
        {'role': 'user', 'content': 'What about 2+3?'}
    ]
)
7
Streaming Responses
8
with client.messages.stream(
    model='llama3.2',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Count from 1 to 10'}]
) as stream:
    for text in stream.text_stream:
        print(text, end='', flush=True)
9
Vision (Images)
10
import base64

with open('image.jpg', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

message = client.messages.create(
    model='llava',
    max_tokens=1024,
    messages=[{
        'role': 'user',
        'content': [
            {'type': 'text', 'text': "What's in this image?"},
            {
                'type': 'image',
                'source': {
                    'type': 'base64',
                    'media_type': 'image/jpeg',
                    'data': image_data
                }
            }
        ]
    }]
)
11
Image URLs not supported - Only base64-encoded images are currently supported.
12
Tool Calling
13
tools = [{
    'name': 'get_weather',
    'description': 'Get the current weather for a location',
    'input_schema': {
        'type': 'object',
        'properties': {
            'location': {
                'type': 'string',
                'description': 'City name'
            },
            'unit': {
                'type': 'string',
                'enum': ['celsius', 'fahrenheit'],
                'description': 'Temperature unit'
            }
        },
        'required': ['location']
    }
}]

message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    tools=tools,
    messages=[{
        'role': 'user',
        'content': 'What is the weather in San Francisco?'
    }]
)

# Check for tool calls
for block in message.content:
    if block.type == 'tool_use':
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")
14
Tool Results
15
Provide tool execution results back to the model:
16
message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    tools=tools,
    messages=[
        {'role': 'user', 'content': 'What is the weather in Tokyo?'},
        {
            'role': 'assistant',
            'content': [{
                'type': 'tool_use',
                'id': 'call_123',
                'name': 'get_weather',
                'input': {'location': 'Tokyo', 'unit': 'celsius'}
            }]
        },
        {
            'role': 'user',
            'content': [{
                'type': 'tool_result',
                'tool_use_id': 'call_123',
                'content': '22°C, partly cloudy'
            }]
        }
    ]
)
17
Thinking Models
18
Enable extended thinking for reasoning-capable models:
19
message = client.messages.create(
    model='deepseek-reasoner',
    max_tokens=2048,
    thinking={'type': 'enabled'},
    messages=[{
        'role': 'user',
        'content': 'Solve this logic puzzle: ...'
    }]
)

# Access thinking content
for block in message.content:
    if block.type == 'thinking':
        print(f"Reasoning: {block.thinking}")
    elif block.type == 'text':
        print(f"Answer: {block.text}")

Using with Claude Code

Claude Code can use Ollama as its backend for local, private code assistance.

Quick Setup

# Auto-configure Claude Code with Ollama
ollama launch claude
This command will:
  1. Prompt you to select a model
  2. Configure Claude Code automatically
  3. Launch Claude Code

Manual Setup

Set environment variables and run Claude Code:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_API_KEY=""

claude --model qwen3-coder

Qwen3 Coder

ollama pull qwen3-coder
Excellent for coding tasks (30B params, requires 24GB VRAM)

GLM-4.7 Cloud

ollama pull glm-4.7:cloud
Cloud model for immediate use

MiniMax M2.1

ollama pull minimax-m2.1:cloud
Fast cloud model

DeepSeek Coder

ollama pull deepseek-coder
Specialized for code generation

Streaming Events

When streaming is enabled, Ollama sends Server-Sent Events (SSE) in Anthropic format:

Event Types

EventDescription
message_startStart of response with initial metadata
content_block_startStart of a content block (text, thinking, tool_use)
content_block_deltaIncremental content update
content_block_stopEnd of a content block
message_deltaMessage-level updates (stop_reason, usage)
message_stopEnd of message
pingKeep-alive ping
errorError event

Delta Types

Delta TypeFieldDescription
text_deltatextText content chunk
thinking_deltathinkingThinking/reasoning chunk
input_json_deltapartial_jsonTool input JSON chunk

Streaming Example

import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama'
)

with client.messages.stream(
    model='llama3.2',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': 'Explain async/await in Python'}]
) as stream:
    for event in stream:
        if event.type == 'content_block_delta':
            if event.delta.type == 'text_delta':
                print(event.delta.text, end='', flush=True)

Migration Guide

Switching from Claude to Ollama

1
Install Ollama
2
Download and install from ollama.com.
3
Pull a Model
4
ollama pull llama3.2
5
Update Your Code
6
Change only the base_url and api_key:
7
# Before (Anthropic Claude)
client = anthropic.Anthropic(
    api_key=os.environ['ANTHROPIC_API_KEY']
)

# After (Ollama)
client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama'  # required but ignored
)
8
Use Local Models
9
Replace Claude model names:
10
# Before
model='claude-3-5-sonnet'

# After
model='llama3.2'

Model Name Aliases

For applications expecting Claude model names:
# Create an alias
ollama cp llama3.2 claude-3-5-sonnet

# Use the alias
client.messages.create(
    model='claude-3-5-sonnet',
    ...
)

Differences from Anthropic API

Behavior Differences

  • API Key: Accepted but not validated (use any string)
  • Version Header: anthropic-version header is accepted but not enforced
  • Token Counts: Approximations based on the model’s tokenizer
  • Request IDs: Generated locally, not from Anthropic servers

Not Supported

Token Counting

/v1/messages/count_tokens endpoint

Tool Choice

tool_choice parameter (auto/any/tool/none)

Prompt Caching

cache_control blocks for caching

Batches API

/v1/messages/batches for async processing

PDF Support

document content blocks with PDFs

Citations

citations content blocks

Partial Support

FeatureStatus
Image content✅ Base64 / ❌ URLs
Extended thinking✅ Basic / ⚠️ budget_tokens not enforced
Web search✅ Built-in tool (Ollama extension)

Advanced Features

Web Search (Ollama Extension)

Ollama extends Anthropic’s API with built-in web search:
tools = [{
    'type': 'web_search_20250305',
    'name': 'web_search',
    'max_uses': 5
}]

message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,
    tools=tools,
    messages=[{
        'role': 'user',
        'content': 'What are the latest AI developments?'
    }]
)
Web search requires internet connectivity and is subject to Ollama’s cloud service availability.

Content Block Types

Ollama supports all standard Anthropic content block types:
  • text - Text content
  • image - Base64-encoded images
  • tool_use - Tool/function calls
  • tool_result - Tool execution results
  • thinking - Reasoning/thinking content
  • server_tool_use - Server-side tool calls (web_search)
  • web_search_tool_result - Web search results

Best Practices

Select models based on your use case:
  • Chat: llama3.2, mistral, gpt-oss:20b
  • Coding: qwen3-coder, deepseek-coder, codellama
  • Vision: llava, bakllava
  • Reasoning: deepseek-reasoner
  • Use appropriate max_tokens to control response length
  • Clear conversation history periodically for long sessions
  • Use system prompts to set consistent behavior
Handle errors gracefully:
try:
    message = client.messages.create(...)
except anthropic.APIError as e:
    print(f"API Error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Examples

Complete Chatbot

import anthropic

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama'
)

conversation = []
system_prompt = "You are a helpful, friendly assistant."

while True:
    user_input = input('You: ')
    if user_input.lower() in ['quit', 'exit']:
        break
    
    conversation.append({'role': 'user', 'content': user_input})
    
    response = client.messages.create(
        model='llama3.2',
        max_tokens=1024,
        system=system_prompt,
        messages=conversation
    )
    
    assistant_message = response.content[0].text
    print(f'Assistant: {assistant_message}')
    
    conversation.append({'role': 'assistant', 'content': assistant_message})

Tool-based Weather Agent

import anthropic
import json

client = anthropic.Anthropic(
    base_url='http://localhost:11434',
    api_key='ollama'
)

def get_weather(location, unit='celsius'):
    # Simulated weather API
    return f"The weather in {location} is 22°{unit[0].upper()}, sunny"

tools = [{
    'name': 'get_weather',
    'description': 'Get current weather for a location',
    'input_schema': {
        'type': 'object',
        'properties': {
            'location': {'type': 'string'},
            'unit': {'type': 'string', 'enum': ['celsius', 'fahrenheit']}
        },
        'required': ['location']
    }
}]

messages = [{'role': 'user', 'content': 'What is the weather in Paris?'}]

while True:
    response = client.messages.create(
        model='llama3.2',
        max_tokens=1024,
        tools=tools,
        messages=messages
    )
    
    if response.stop_reason == 'tool_use':
        # Execute tools
        tool_results = []
        for block in response.content:
            if block.type == 'tool_use':
                result = get_weather(**block.input)
                tool_results.append({
                    'type': 'tool_result',
                    'tool_use_id': block.id,
                    'content': result
                })
        
        # Add assistant's tool use
        messages.append({'role': 'assistant', 'content': response.content})
        # Add tool results
        messages.append({'role': 'user', 'content': tool_results})
    else:
        # Final response
        print(response.content[0].text)
        break

Troubleshooting

Ensure Ollama is running:
ollama serve
Check the service is accessible:
curl http://localhost:11434/v1/messages
Pull the model first:
ollama pull llama3.2
List available models:
ollama list
  • Check GPU memory usage
  • Use smaller models for faster inference
  • Reduce max_tokens for shorter responses
  • Close other applications using GPU
The max_tokens parameter is required in Anthropic API:
# Always specify max_tokens
message = client.messages.create(
    model='llama3.2',
    max_tokens=1024,  # Required
    messages=[...]
)

Resources

Ollama Models

Browse available models

Anthropic SDK

Anthropic Python library

Claude Code

Claude’s coding assistant

Community

Join the Ollama community