Chat - Ollama

Chat enables interactive, multi-turn conversations with language models. The chat endpoint maintains conversation history through the messages array, allowing models to understand context across multiple exchanges.

Quick start

CLI
cURL
Python
JavaScript

Start an interactive chat session:

ollama run llama3.2

The CLI automatically maintains conversation history until you exit.

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "Why is the sky blue?"}
  ],
  "stream": false
}'

from ollama import chat

response = chat(
  model='llama3.2',
  messages=[
    {'role': 'user', 'content': 'Why is the sky blue?'}
  ]
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)

Multi-turn conversations

Maintain conversation history by appending each message to the messages array:

Python
JavaScript

from ollama import chat

messages = [
  {'role': 'user', 'content': 'What is the capital of France?'}
]

response = chat(model='llama3.2', messages=messages)
messages.append(response.message)

# Follow-up question
messages.append({
  'role': 'user',
  'content': 'What is its population?'
})

response = chat(model='llama3.2', messages=messages)
print(response.message.content)

import ollama from 'ollama'

const messages = [
  { role: 'user', content: 'What is the capital of France?' }
]

let response = await ollama.chat({ model: 'llama3.2', messages })
messages.push(response.message)

// Follow-up question
messages.push({
  role: 'user',
  content: 'What is its population?'
})

response = await ollama.chat({ model: 'llama3.2', messages })
console.log(response.message.content)

Message roles

The chat API supports three message roles:

user: Messages from the user/human
assistant: Messages from the AI model
system: Instructions that guide the model’s behavior
tool: Results from tool/function calls (see Tool Calling)

Python
JavaScript

from ollama import chat

messages = [
  {
    'role': 'system',
    'content': 'You are a helpful assistant that speaks like a pirate.'
  },
  {
    'role': 'user',
    'content': 'Tell me about Python programming.'
  }
]

response = chat(model='llama3.2', messages=messages)
print(response.message.content)

import ollama from 'ollama'

const messages = [
  {
    role: 'system',
    content: 'You are a helpful assistant that speaks like a pirate.'
  },
  {
    role: 'user',
    content: 'Tell me about Python programming.'
  }
]

const response = await ollama.chat({ model: 'llama3.2', messages })
console.log(response.message.content)

API parameters

model

string

required

The model name (e.g., llama3.2, qwen3)

messages

array

required

Array of message objects with role and content fields

stream

boolean

default:"true"

Enable streaming responses (see Streaming)

format

string | object

Response format: "json" for JSON mode or a JSON schema object (see Structured Outputs)

options

object

Model options like temperature, top_p, num_ctx, etc.

keep_alive

duration

default:"5m"

How long to keep the model loaded in memory

tools

array

List of tools available for the model to call (see Tool Calling)

think

boolean | string

Enable thinking/reasoning mode (see Thinking)

Response structure

Streaming
Non-streaming

When streaming is enabled (default), the response is a series of JSON objects:

{
  "model": "llama3.2",
  "created_at": "2024-12-09T21:07:55.186497Z",
  "message": {
    "role": "assistant",
    "content": "The "
  },
  "done": false
}

The final message includes metrics:

{
  "model": "llama3.2",
  "created_at": "2024-12-09T21:07:55.186497Z",
  "message": {
    "role": "assistant",
    "content": ""
  },
  "done": true,
  "total_duration": 4648158584,
  "load_duration": 4071084,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 107345000,
  "eval_count": 298,
  "eval_duration": 4289432000
}

With stream: false, the response is a single JSON object:

{
  "model": "llama3.2",
  "created_at": "2024-12-09T21:07:55.186497Z",
  "message": {
    "role": "assistant",
    "content": "The sky appears blue due to Rayleigh scattering..."
  },
  "done": true,
  "total_duration": 4648158584,
  "load_duration": 4071084,
  "prompt_eval_count": 26,
  "prompt_eval_duration": 107345000,
  "eval_count": 298,
  "eval_duration": 4289432000
}

Tips

Store the entire messages array to maintain full conversation context
Include a system message at the start to set the assistant’s behavior
Use keep_alive to keep models loaded for faster subsequent requests
Set temperature: 0 in options for more deterministic responses
See Streaming for real-time response rendering

​Quick start

​Multi-turn conversations

​Message roles

​API parameters

​Response structure

​Tips