Thinking - Ollama

Thinking-capable models emit a thinking field that separates their reasoning trace from the final answer. Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.

Supported models

Qwen 3
GPT-OSS (use think levels: low, medium, high — the trace cannot be fully disabled)
DeepSeek-v3.1
DeepSeek R1
Browse the latest additions under thinking models

Enable thinking in API calls

Set the think field on chat or generate requests. Most models accept booleans (true/false). GPT-OSS instead expects one of low, medium, or high to tune the trace length. The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.

cURL
Python
JavaScript

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3",
  "messages": [{
    "role": "user",
    "content": "How many letter r are in strawberry?"
  }],
  "think": true,
  "stream": false
}'

from ollama import chat

response = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'How many letter r are in strawberry?'}],
  think=True,
  stream=False,
)

print('Thinking:\n', response.message.thinking)
print('Answer:\n', response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'deepseek-r1',
  messages: [{ role: 'user', content: 'How many letter r are in strawberry?' }],
  think: true,
  stream: false,
})

console.log('Thinking:\n', response.message.thinking)
console.log('Answer:\n', response.message.content)

GPT-OSS requires think to be set to "low", "medium", or "high". Passing true/false is ignored for that model.

Stream the reasoning trace

Thinking streams interleave reasoning tokens before answer tokens. Detect the first thinking chunk to render a “thinking” section, then switch to the final reply once message.content arrives.

Python
JavaScript

from ollama import chat

stream = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
  think=True,
  stream=True,
)

in_thinking = False

for chunk in stream:
  if chunk.message.thinking and not in_thinking:
    in_thinking = True
    print('Thinking:\n', end='')

  if chunk.message.thinking:
    print(chunk.message.thinking, end='')
  elif chunk.message.content:
    if in_thinking:
      print('\n\nAnswer:\n', end='')
      in_thinking = False
    print(chunk.message.content, end='')

import ollama from 'ollama'

async function main() {
  const stream = await ollama.chat({
    model: 'qwen3',
    messages: [{ role: 'user', content: 'What is 17 × 23?' }],
    think: true,
    stream: true,
  })

  let inThinking = false

  for await (const chunk of stream) {
    if (chunk.message.thinking && !inThinking) {
      inThinking = true
      process.stdout.write('Thinking:\n')
    }

    if (chunk.message.thinking) {
      process.stdout.write(chunk.message.thinking)
    } else if (chunk.message.content) {
      if (inThinking) {
        process.stdout.write('\n\nAnswer:\n')
        inThinking = false
      }
      process.stdout.write(chunk.message.content)
    }
  }
}

main()

CLI quick reference

Enable thinking for a single run:

ollama run deepseek-r1 --think "Where should I visit in Lisbon?"

Disable thinking:

ollama run deepseek-r1 --think=false "Summarize this article"

Hide the trace while still using a thinking model:

ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"

Inside interactive sessions, toggle with /set think or /set nothink
GPT-OSS only accepts levels:
```
ollama run gpt-oss --think=low "Draft a headline"
```
Replace low with medium or high as needed.

Thinking is enabled by default in the CLI and API for supported models.

Think parameter values

think

boolean | string

For most models:

true: Enable thinking/reasoning
false: Disable thinking (where supported)

For GPT-OSS only:

"low": Minimal reasoning trace
"medium": Moderate reasoning detail
"high": Maximum reasoning detail

Response structure

Thinking models return separate fields for reasoning and final output:

{
  "model": "qwen3",
  "created_at": "2024-12-09T21:07:55.186497Z",
  "message": {
    "role": "assistant",
    "thinking": "Let me count the letters in 'strawberry': s-t-r-a-w-b-e-r-r-y. I see 'r' appears at positions 3, 8, and 9...",
    "content": "There are 3 letter 'r's in the word 'strawberry'."
  },
  "done": true
}

Thinking with tool calling

Combine thinking with tool calling for transparent reasoning about which tools to use:

from ollama import chat

def get_weather(city: str) -> str:
  return f"Weather in {city}: Sunny, 22°C"

response = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'What\'s the weather in Paris?'}],
  tools=[get_weather],
  think=True
)

print(f"Thinking: {response.message.thinking}")
print(f"Tool calls: {response.message.tool_calls}")

Use cases

Debugging model behavior

Inspect the reasoning trace to understand why a model made certain decisions:

response = chat(
  model='qwen3',
  messages=[{'role': 'user', 'content': 'Is 9.11 greater than 9.9?'}],
  think=True
)

# Review the thinking to see if the model correctly compared decimals
print(response.message.thinking)

Educational applications

Show students how the model approaches problems:

response = chat(
  model='deepseek-r1',
  messages=[{'role': 'user', 'content': 'Solve: 2x + 5 = 13'}],
  think=True
)

print("Step-by-step reasoning:")
print(response.message.thinking)
print("\nFinal answer:")
print(response.message.content)

UI with thinking animation

Show a “thinking” indicator while reasoning is in progress:

const stream = await ollama.chat({
  model: 'qwen3',
  messages: [{ role: 'user', content: 'Plan a trip to Tokyo' }],
  think: true,
  stream: true,
})

let thinkingDisplay = document.getElementById('thinking')
let answerDisplay = document.getElementById('answer')

for await (const chunk of stream) {
  if (chunk.message.thinking) {
    thinkingDisplay.textContent += chunk.message.thinking
  } else if (chunk.message.content) {
    thinkingDisplay.style.display = 'none'
    answerDisplay.textContent += chunk.message.content
  }
}

Tips

Use thinking mode during development and debugging
Hide thinking in production if users only need final answers
Combine with temperature: 0 for more consistent reasoning
Enable thinking for complex reasoning tasks (math, logic, planning)
GPT-OSS requires explicit levels (low/medium/high) instead of boolean values
Thinking adds latency and tokens—disable it for simple queries where reasoning isn’t needed

​Supported models

​Enable thinking in API calls

​Stream the reasoning trace

​CLI quick reference

​Think parameter values

​Response structure

​Thinking with tool calling

​Use cases

​Debugging model behavior

​Educational applications

​UI with thinking animation

​Tips

Supported models

Enable thinking in API calls

Stream the reasoning trace

CLI quick reference

Think parameter values

Response structure

Thinking with tool calling

Use cases

Debugging model behavior

Educational applications

UI with thinking animation

Tips