ollama ps - Ollama

Synopsis

ollama ps [MODEL_PREFIX]

Description

The ps command displays all models currently loaded in memory. It shows:

Model names
Model IDs (digest)
Memory usage
Processor allocation (CPU/GPU split)
Context window size
Time until automatic unload

This is useful for monitoring resource usage and seeing which models are ready to serve requests without loading time.

Arguments

MODEL_PREFIX

string

Optional prefix to filter models. Shows only running models whose names start with this prefix.

ollama ps llama  # Show only running llama models

Options

The ps command has no flags or options.

Examples

List All Running Models

Show all currently loaded models:

ollama ps

Output:

NAME                ID          SIZE     PROCESSOR         CONTEXT    UNTIL
llama3.2:latest     8934d96d    4.7 GB   100% GPU          2048       4 minutes
mistral:7b          2e0493f6    4.1 GB   45%/55% CPU/GPU   4096       2 minutes

Filter by Prefix

Show only running models starting with “llama”:

ollama ps llama

Output:

NAME                ID          SIZE     PROCESSOR         CONTEXT    UNTIL
llama3.2:latest     8934d96d    4.7 GB   100% GPU          2048       4 minutes

No Models Running

When no models are loaded:

ollama ps

Output:

NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL

Output Format

The output is formatted as a table with these columns:

NAME

string

Full model name including tag (e.g., llama3.2:latest)

string

First 12 characters of the model’s SHA256 digest

SIZE

string

Total memory used by the model (including model weights, KV cache, etc.)

PROCESSOR

string

Where the model is running:

100% GPU - Fully on GPU
100% CPU - Fully on CPU
45%/55% CPU/GPU - Split between CPU and GPU
Unknown - Cannot determine allocation

CONTEXT

integer

Current context window size in tokens

UNTIL

string

Time until the model will be automatically unloaded from memory:

“4 minutes”
“10 seconds”
“Never” - Configured to stay loaded indefinitely
“Stopping…” - Currently being unloaded

Understanding Processor Allocation

The PROCESSOR column shows how model layers are distributed:

100% GPU

PROCESSOR: 100% GPU

All model layers are on the GPU. Best performance for inference.

100% CPU

PROCESSOR: 100% CPU

All model layers are on the CPU. Slower but works without a GPU.

Split CPU/GPU

PROCESSOR: 45%/55% CPU/GPU

Model layers are split between CPU and GPU. Common when:

GPU VRAM is insufficient for the entire model
Multiple models are loaded simultaneously
GPU is being shared with other applications

Unknown

PROCESSOR: Unknown

Processor allocation cannot be determined.

Model Lifecycle

Models go through these states:

Not loaded: Model exists but is not in memory
Loading: Model is being loaded into memory
Running: Model is loaded and ready (shown by ollama ps)
Stopping: Model is being unloaded (shows “Stopping…”)
Unloaded: Model removed from memory

Keep-Alive Timer

The UNTIL column shows when a model will be automatically unloaded:

Timer resets each time the model is used
Default is typically 5 minutes of inactivity
Can be configured per-request or server-wide
“Never” means the model stays loaded until manually stopped

Memory Management

Monitor total memory usage:

# Check GPU memory
nvidia-smi

# Check system memory
free -h

# Calculate total Ollama memory
ollama ps | awk 'NR>1 {gsub(/[^0-9.]/,"",$3); sum+=$3} END {print sum " GB"}'

Scripting Usage

Use in scripts and automation:

# Check if a model is running
if ollama ps | grep -q "llama3.2:latest"; then
    echo "Model is running"
else
    echo "Model is not loaded"
fi

# Count running models
RUNNING_COUNT=$(ollama ps | tail -n +2 | wc -l)
echo "$RUNNING_COUNT models running"

# Get memory usage of a specific model
MEMORY=$(ollama ps | grep "llama3.2" | awk '{print $3, $4}')
echo "llama3.2 using: $MEMORY"

# List models about to unload (< 1 minute)
ollama ps | grep "second"

Common Patterns

Find GPU-Only Models

Find models running entirely on GPU:

ollama ps | grep "100% GPU"

Find Large Memory Users

Find models using more than 5GB:

ollama ps | awk '$3 ~ /GB$/ && $3+0 > 5 {print $1, $3}'

Monitor Unload Times

Watch models about to be unloaded:

watch -n 5 'ollama ps'

Performance Considerations

GPU vs CPU Performance

Allocation	Typical Speed	Use Case
100% GPU	10-50 tok/s	Best for most workloads
CPU/GPU Split	5-20 tok/s	Large models, limited VRAM
100% CPU	1-5 tok/s	No GPU, or GPU fully utilized

Multiple Models

When running multiple models:

Total memory is sum of all models
Each model competes for resources
Consider adjusting OLLAMA_MAX_LOADED_MODELS

Environment Variables

OLLAMA_HOST

string

default:"http://127.0.0.1:11434"

Ollama server address to query for running models

OLLAMA_KEEP_ALIVE

duration

default:"5m"

Server-wide default keep-alive time (affects UNTIL column)

OLLAMA_MAX_LOADED_MODELS

integer

default:"1"

Maximum number of models that can be loaded simultaneously

Exit Codes

0 - Success
1 - Error (server not running, etc.)

Troubleshooting

Server Not Running

Error: could not connect to ollama server

Solution: Start the Ollama server:

ollama serve

No Models Showing

If no models appear, none are currently loaded. Run a model to load it:

ollama run llama3.2 "Hello"
ollama ps

Model Shows “Stopping…”

The model is being unloaded. This is normal when:

Keep-alive timer expired
You ran ollama stop MODEL
Server is shutting down
Another model needs the memory

Comparison with Other Commands

Command	Purpose
`ollama ps`	Show running models in memory
`ollama list`	Show all available models (not just running)
`ollama show MODEL`	Show detailed info about one model

ollama list - List all available models
ollama stop - Stop a running model
ollama run - Load and run a model
ollama serve - Configure server memory settings

​Synopsis

​Description

​Arguments

​Options

​Examples

​List All Running Models

​Filter by Prefix

​No Models Running

​Output Format

​Understanding Processor Allocation

​100% GPU

​100% CPU

​Split CPU/GPU

​Unknown

​Model Lifecycle

​Keep-Alive Timer

​Memory Management

​Scripting Usage

​Common Patterns

​Find GPU-Only Models

​Find Large Memory Users

​Monitor Unload Times

​Performance Considerations

​GPU vs CPU Performance

​Multiple Models

​Environment Variables

​Exit Codes

​Troubleshooting

​Server Not Running

​No Models Showing

​Model Shows “Stopping…”

​Comparison with Other Commands

​Related Commands

Synopsis

Description

Arguments

Options

Examples

List All Running Models

Filter by Prefix

No Models Running

Output Format

Understanding Processor Allocation

100% GPU

100% CPU

Split CPU/GPU

Unknown

Model Lifecycle

Keep-Alive Timer

Memory Management

Scripting Usage

Common Patterns

Find GPU-Only Models

Find Large Memory Users

Monitor Unload Times

Performance Considerations

GPU vs CPU Performance

Multiple Models

Environment Variables

Exit Codes

Troubleshooting

Server Not Running

No Models Showing

Model Shows “Stopping…”

Comparison with Other Commands

Related Commands