Thinking-capable models emit a thinking field that separates their reasoning trace from the final answer.
Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.
Supported models
Enable thinking in API calls
Set the think field on chat or generate requests. Most models accept booleans (true/false).
GPT-OSS instead expects one of low, medium, or high to tune the trace length.
The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.
curl http://localhost:11434/api/chat -d '{
"model": "qwen3",
"messages": [{
"role": "user",
"content": "How many letter r are in strawberry?"
}],
"think": true,
"stream": false
}'
from ollama import chat
response = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'How many letter r are in strawberry?'}],
think=True,
stream=False,
)
print('Thinking:\n', response.message.thinking)
print('Answer:\n', response.message.content)
import ollama from 'ollama'
const response = await ollama.chat({
model: 'deepseek-r1',
messages: [{ role: 'user', content: 'How many letter r are in strawberry?' }],
think: true,
stream: false,
})
console.log('Thinking:\n', response.message.thinking)
console.log('Answer:\n', response.message.content)
GPT-OSS requires think to be set to "low", "medium", or "high". Passing true/false is ignored for that model.
Stream the reasoning trace
Thinking streams interleave reasoning tokens before answer tokens. Detect the first thinking chunk to render a “thinking” section, then switch to the final reply once message.content arrives.
from ollama import chat
stream = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'What is 17 × 23?'}],
think=True,
stream=True,
)
in_thinking = False
for chunk in stream:
if chunk.message.thinking and not in_thinking:
in_thinking = True
print('Thinking:\n', end='')
if chunk.message.thinking:
print(chunk.message.thinking, end='')
elif chunk.message.content:
if in_thinking:
print('\n\nAnswer:\n', end='')
in_thinking = False
print(chunk.message.content, end='')
import ollama from 'ollama'
async function main() {
const stream = await ollama.chat({
model: 'qwen3',
messages: [{ role: 'user', content: 'What is 17 × 23?' }],
think: true,
stream: true,
})
let inThinking = false
for await (const chunk of stream) {
if (chunk.message.thinking && !inThinking) {
inThinking = true
process.stdout.write('Thinking:\n')
}
if (chunk.message.thinking) {
process.stdout.write(chunk.message.thinking)
} else if (chunk.message.content) {
if (inThinking) {
process.stdout.write('\n\nAnswer:\n')
inThinking = false
}
process.stdout.write(chunk.message.content)
}
}
}
main()
CLI quick reference
-
Enable thinking for a single run:
ollama run deepseek-r1 --think "Where should I visit in Lisbon?"
-
Disable thinking:
ollama run deepseek-r1 --think=false "Summarize this article"
-
Hide the trace while still using a thinking model:
ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"
-
Inside interactive sessions, toggle with
/set think or /set nothink
-
GPT-OSS only accepts levels:
ollama run gpt-oss --think=low "Draft a headline"
Replace low with medium or high as needed.
Thinking is enabled by default in the CLI and API for supported models.
Think parameter values
For most models:
true: Enable thinking/reasoning
false: Disable thinking (where supported)
For GPT-OSS only:
"low": Minimal reasoning trace
"medium": Moderate reasoning detail
"high": Maximum reasoning detail
Response structure
Thinking models return separate fields for reasoning and final output:
{
"model": "qwen3",
"created_at": "2024-12-09T21:07:55.186497Z",
"message": {
"role": "assistant",
"thinking": "Let me count the letters in 'strawberry': s-t-r-a-w-b-e-r-r-y. I see 'r' appears at positions 3, 8, and 9...",
"content": "There are 3 letter 'r's in the word 'strawberry'."
},
"done": true
}
Combine thinking with tool calling for transparent reasoning about which tools to use:
from ollama import chat
def get_weather(city: str) -> str:
return f"Weather in {city}: Sunny, 22°C"
response = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'What\'s the weather in Paris?'}],
tools=[get_weather],
think=True
)
print(f"Thinking: {response.message.thinking}")
print(f"Tool calls: {response.message.tool_calls}")
Use cases
Debugging model behavior
Inspect the reasoning trace to understand why a model made certain decisions:
response = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'Is 9.11 greater than 9.9?'}],
think=True
)
# Review the thinking to see if the model correctly compared decimals
print(response.message.thinking)
Educational applications
Show students how the model approaches problems:
response = chat(
model='deepseek-r1',
messages=[{'role': 'user', 'content': 'Solve: 2x + 5 = 13'}],
think=True
)
print("Step-by-step reasoning:")
print(response.message.thinking)
print("\nFinal answer:")
print(response.message.content)
UI with thinking animation
Show a “thinking” indicator while reasoning is in progress:
const stream = await ollama.chat({
model: 'qwen3',
messages: [{ role: 'user', content: 'Plan a trip to Tokyo' }],
think: true,
stream: true,
})
let thinkingDisplay = document.getElementById('thinking')
let answerDisplay = document.getElementById('answer')
for await (const chunk of stream) {
if (chunk.message.thinking) {
thinkingDisplay.textContent += chunk.message.thinking
} else if (chunk.message.content) {
thinkingDisplay.style.display = 'none'
answerDisplay.textContent += chunk.message.content
}
}
Tips
- Use thinking mode during development and debugging
- Hide thinking in production if users only need final answers
- Combine with
temperature: 0 for more consistent reasoning
- Enable thinking for complex reasoning tasks (math, logic, planning)
- GPT-OSS requires explicit levels (
low/medium/high) instead of boolean values
- Thinking adds latency and tokens—disable it for simple queries where reasoning isn’t needed