Ollama’s web search API can be used to augment models with the latest information to reduce hallucinations and improve accuracy.
Web search is provided as a REST API with deeper tool integrations in the Python and JavaScript libraries. This enables models to conduct long-running research tasks with access to current information.
Authentication
For access to Ollama’s web search API, create an API key. A free Ollama account is required.
Web search API
Performs a web search for a single query and returns relevant results.
Request
POST https://ollama.com/api/web_search
Maximum results to return (max 10)
Response
{
"results": [
{
"title": "Page title",
"url": "https://example.com",
"content": "Relevant content snippet from the page"
}
]
}
Examples
Ensure OLLAMA_API_KEY is set or it must be passed in the Authorization header.
curl https://ollama.com/api/web_search \
--header "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"query":"what is ollama?"
}'
import ollama
response = ollama.web_search("What is Ollama?")
print(response)
More examples: Python web search exampleimport { Ollama } from "ollama";
const client = new Ollama();
const results = await client.webSearch("what is ollama?");
console.log(JSON.stringify(results, null, 2));
More examples: JavaScript web search example
Web fetch API
Fetches a single web page by URL and returns its content.
Request
POST https://ollama.com/api/web_fetch
Response
{
"title": "Page title",
"content": "Main content of the web page",
"links": [
"https://example.com/page1",
"https://example.com/page2"
]
}
Examples
curl --request POST \
--url https://ollama.com/api/web_fetch \
--header "Authorization: Bearer $OLLAMA_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"url": "ollama.com"
}'
from ollama import web_fetch
result = web_fetch('https://ollama.com')
print(result)
import { Ollama } from "ollama";
const client = new Ollama();
const fetchResult = await client.webFetch("https://ollama.com");
console.log(JSON.stringify(fetchResult, null, 2));
Building a search agent
Use Ollama’s web search API as a tool to build a mini search agent:
from ollama import chat, web_fetch, web_search
available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
messages = [{'role': 'user', 'content': "what is ollama's new engine"}]
while True:
response = chat(
model='qwen3:4b',
messages=messages,
tools=[web_search, web_fetch],
think=True
)
if response.message.thinking:
print('Thinking: ', response.message.thinking)
if response.message.content:
print('Content: ', response.message.content)
messages.append(response.message)
if response.message.tool_calls:
print('Tool calls: ', response.message.tool_calls)
for tool_call in response.message.tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
args = tool_call.function.arguments
result = function_to_call(**args)
print('Result: ', str(result)[:200]+'...')
# Result is truncated for limited context lengths
messages.append({'role': 'tool', 'content': str(result)[:2000 * 4], 'tool_name': tool_call.function.name})
else:
messages.append({'role': 'tool', 'content': f'Tool {tool_call.function.name} not found', 'tool_name': tool_call.function.name})
else:
break
Example output:
Thinking: Okay, the user is asking about Ollama's new engine. I need to figure out what they're referring to...
Tool calls: [ToolCall(function=Function(name='web_search', arguments={'max_results': 3, 'query': 'Ollama new engine'}))]
Result: results=[WebSearchResult(content='# New model scheduling\n\n## September 23, 2025\n\nOllama now includes a significantly improved model scheduling system...
Thinking: Okay, the user asked about Ollama's new engine. Let me look at the search results...
Content: Ollama has introduced two key updates to its engine, both released in 2025:
1. **Enhanced Model Scheduling (September 23, 2025)**
- Precision Memory Management
- Performance Gains: 85.54 tokens/s vs 52.02 tokens/s
- Multi-GPU Support
2. **Multimodal Engine (May 15, 2025)**
- Vision Support for models like llama4:scout, gemma3
- Multimodal tasks including image identification
Context length and agents
Web search results can return thousands of tokens. It is recommended to increase the context length of the model to at least ~32000 tokens. Search agents work best with full context length. Ollama’s cloud models run at the full context length.
response = chat(
model='qwen3:4b',
messages=messages,
tools=[web_search],
options={'num_ctx': 32768} # Increase context length
)
MCP Server integration
You can enable web search in any MCP client through the Python MCP server.
Cline
Add this configuration to Cline’s MCP settings:
{
"mcpServers": {
"web_search_and_fetch": {
"type": "stdio",
"command": "uv",
"args": ["run", "path/to/web-search-mcp.py"],
"env": { "OLLAMA_API_KEY": "your_api_key_here" }
}
}
}
Codex
Add this to ~/.codex/config.toml:
[mcp_servers.web_search]
command = "uv"
args = ["run", "path/to/web-search-mcp.py"]
env = { "OLLAMA_API_KEY" = "your_api_key_here" }
Other integrations
Ollama can be integrated into most tools through:
- Direct integration of Ollama’s API
- Python / JavaScript libraries
- OpenAI compatible API
- MCP server integration
Tips
- Use web search for queries that require current information
- Truncate search results to fit within model context limits
- Combine with thinking mode for better reasoning about search results
- Use
web_fetch to get full page content when needed
- Set appropriate
max_results to balance detail and context usage
- Cache search results when possible to reduce API calls