ollama run - Ollama

Synopsis

ollama run MODEL [PROMPT]

Description

The run command starts a model and allows you to interact with it. If the model is not already present locally, it will be pulled automatically from the registry. You can use run in two modes:

Interactive mode: No prompt provided, opens a chat interface
Single-shot mode: Prompt provided, returns response and exits

Arguments

MODEL

string

required

Name of the model to run (e.g., llama3.2, mistral, codellama:7b)

PROMPT

string

Optional prompt to send to the model. If omitted, enters interactive mode.

Options

--keepalive

duration

Duration to keep the model loaded in memory (e.g., 5m, 1h)

Default: Server configuration (typically 5 minutes)
0 - Unload immediately after use
-1 - Keep loaded indefinitely

--format

string

Response format. Use json to request JSON output.

ollama run llama3.2 --format json "List 3 programming languages"

--verbose

boolean

default:"false"

Show detailed timing information for the response, including:

Prompt evaluation time
Response generation time
Total tokens processed

--nowordwrap

boolean

default:"false"

Disable automatic word wrapping in terminal output

Thinking Mode Options

--think

string|boolean

Enable thinking mode for supported models. Accepts:

true or empty - Enable default thinking
false - Disable thinking
high, medium, low - Set thinking effort level

ollama run deepseek-r1 --think
ollama run deepseek-r1 --think=high

--hidethinking

boolean

default:"false"

Hide the thinking output from display (only show final answer)

Embedding Model Options

--truncate

boolean

default:"true"

For embedding models: truncate inputs that exceed context length. Set to false to error instead.

--dimensions

integer

default:"0"

For embedding models: truncate output embeddings to specified dimension

Experimental Options

--experimental

boolean

default:"false"

Enable experimental agent loop with tool support

--experimental-yolo

boolean

default:"false"

Skip all tool approval prompts (use with caution)

--experimental-websearch

boolean

default:"false"

Enable web search tool in experimental mode

Examples

Interactive Mode

Start an interactive chat session:

ollama run llama3.2

Single Prompt

Run a one-off prompt:

ollama run llama3.2 "Why is the sky blue?"

Piped Input

Pipe content from stdin:

cat document.txt | ollama run llama3.2 "Summarize this document:"

JSON Output

Request structured JSON responses:

ollama run llama3.2 --format json "List 3 colors"

Control Model Memory

# Keep model loaded for 10 minutes
ollama run llama3.2 --keepalive 10m "Hello"

# Unload immediately after response
ollama run llama3.2 --keepalive 0 "Hello"

# Keep loaded indefinitely
ollama run llama3.2 --keepalive -1

Thinking Models

Use reasoning models with visible thinking:

# Enable thinking with default settings
ollama run deepseek-r1 --think "Solve this problem: ..."

# High effort thinking
ollama run deepseek-r1 --think=high "Complex reasoning task"

# Hide thinking output
ollama run deepseek-r1 --think --hidethinking "Just show the answer"

Embedding Models

Generate embeddings:

ollama run nomic-embed-text "Your text here"

Embedding models return JSON arrays of floating-point numbers representing the text embedding.

Interactive Mode Commands

When in interactive mode, you can use these special commands:

/bye - Exit the session
/clear - Clear conversation history
/show - Display model information
/set - Set session parameters (e.g., /set think, /set nothink)
/load <image> - Load an image for multimodal models
Multiline input: Use """ to enter multiline mode, """ again to submit

Output Format

Interactive mode displays:

>>> Your prompt here

Model response appears here...

Non-interactive mode outputs the response directly to stdout.

Environment Variables

OLLAMA_HOST

string

default:"http://127.0.0.1:11434"

Ollama server address

OLLAMA_EDITOR

string

Editor to use for multiline input (e.g., vim, nano)

OLLAMA_NOHISTORY

boolean

default:"false"

Disable saving chat history

Exit Codes

0 - Success
1 - Model not found or error occurred
130 - Interrupted by user (Ctrl+C)

ollama pull - Download a model without running it
ollama show - Display model information
ollama ps - List currently running models
ollama stop - Stop a running model

​Synopsis

​Description

​Arguments

​Options

​Thinking Mode Options

​Embedding Model Options

​Experimental Options

​Examples

​Interactive Mode

​Single Prompt

​Piped Input

​JSON Output

​Control Model Memory

​Thinking Models

​Embedding Models

​Interactive Mode Commands

​Output Format

​Environment Variables

​Exit Codes

​Related Commands

Synopsis

Description

Arguments

Options

Thinking Mode Options

Embedding Model Options

Experimental Options

Examples

Interactive Mode

Single Prompt

Piped Input

JSON Output

Control Model Memory

Thinking Models

Embedding Models

Interactive Mode Commands

Output Format

Environment Variables

Exit Codes

Related Commands