Skip to main content

Synopsis

ollama ps [MODEL_PREFIX]

Description

The ps command displays all models currently loaded in memory. It shows:
  • Model names
  • Model IDs (digest)
  • Memory usage
  • Processor allocation (CPU/GPU split)
  • Context window size
  • Time until automatic unload
This is useful for monitoring resource usage and seeing which models are ready to serve requests without loading time.

Arguments

MODEL_PREFIX
string
Optional prefix to filter models. Shows only running models whose names start with this prefix.
ollama ps llama  # Show only running llama models

Options

The ps command has no flags or options.

Examples

List All Running Models

Show all currently loaded models:
ollama ps
Output:
NAME                ID          SIZE     PROCESSOR         CONTEXT    UNTIL
llama3.2:latest     8934d96d    4.7 GB   100% GPU          2048       4 minutes
mistral:7b          2e0493f6    4.1 GB   45%/55% CPU/GPU   4096       2 minutes

Filter by Prefix

Show only running models starting with “llama”:
ollama ps llama
Output:
NAME                ID          SIZE     PROCESSOR         CONTEXT    UNTIL
llama3.2:latest     8934d96d    4.7 GB   100% GPU          2048       4 minutes

No Models Running

When no models are loaded:
ollama ps
Output:
NAME    ID    SIZE    PROCESSOR    CONTEXT    UNTIL

Output Format

The output is formatted as a table with these columns:
NAME
string
Full model name including tag (e.g., llama3.2:latest)
ID
string
First 12 characters of the model’s SHA256 digest
SIZE
string
Total memory used by the model (including model weights, KV cache, etc.)
PROCESSOR
string
Where the model is running:
  • 100% GPU - Fully on GPU
  • 100% CPU - Fully on CPU
  • 45%/55% CPU/GPU - Split between CPU and GPU
  • Unknown - Cannot determine allocation
CONTEXT
integer
Current context window size in tokens
UNTIL
string
Time until the model will be automatically unloaded from memory:
  • “4 minutes”
  • “10 seconds”
  • “Never” - Configured to stay loaded indefinitely
  • “Stopping…” - Currently being unloaded

Understanding Processor Allocation

The PROCESSOR column shows how model layers are distributed:

100% GPU

PROCESSOR: 100% GPU
All model layers are on the GPU. Best performance for inference.

100% CPU

PROCESSOR: 100% CPU
All model layers are on the CPU. Slower but works without a GPU.

Split CPU/GPU

PROCESSOR: 45%/55% CPU/GPU
Model layers are split between CPU and GPU. Common when:
  • GPU VRAM is insufficient for the entire model
  • Multiple models are loaded simultaneously
  • GPU is being shared with other applications

Unknown

PROCESSOR: Unknown
Processor allocation cannot be determined.

Model Lifecycle

Models go through these states:
  1. Not loaded: Model exists but is not in memory
  2. Loading: Model is being loaded into memory
  3. Running: Model is loaded and ready (shown by ollama ps)
  4. Stopping: Model is being unloaded (shows “Stopping…”)
  5. Unloaded: Model removed from memory

Keep-Alive Timer

The UNTIL column shows when a model will be automatically unloaded:
  • Timer resets each time the model is used
  • Default is typically 5 minutes of inactivity
  • Can be configured per-request or server-wide
  • “Never” means the model stays loaded until manually stopped

Memory Management

Monitor total memory usage:
# Check GPU memory
nvidia-smi

# Check system memory
free -h

# Calculate total Ollama memory
ollama ps | awk 'NR>1 {gsub(/[^0-9.]/,"",$3); sum+=$3} END {print sum " GB"}'

Scripting Usage

Use in scripts and automation:
# Check if a model is running
if ollama ps | grep -q "llama3.2:latest"; then
    echo "Model is running"
else
    echo "Model is not loaded"
fi

# Count running models
RUNNING_COUNT=$(ollama ps | tail -n +2 | wc -l)
echo "$RUNNING_COUNT models running"

# Get memory usage of a specific model
MEMORY=$(ollama ps | grep "llama3.2" | awk '{print $3, $4}')
echo "llama3.2 using: $MEMORY"

# List models about to unload (< 1 minute)
ollama ps | grep "second"

Common Patterns

Find GPU-Only Models

Find models running entirely on GPU:
ollama ps | grep "100% GPU"

Find Large Memory Users

Find models using more than 5GB:
ollama ps | awk '$3 ~ /GB$/ && $3+0 > 5 {print $1, $3}'

Monitor Unload Times

Watch models about to be unloaded:
watch -n 5 'ollama ps'

Performance Considerations

GPU vs CPU Performance

AllocationTypical SpeedUse Case
100% GPU10-50 tok/sBest for most workloads
CPU/GPU Split5-20 tok/sLarge models, limited VRAM
100% CPU1-5 tok/sNo GPU, or GPU fully utilized

Multiple Models

When running multiple models:
  • Total memory is sum of all models
  • Each model competes for resources
  • Consider adjusting OLLAMA_MAX_LOADED_MODELS

Environment Variables

OLLAMA_HOST
string
default:"http://127.0.0.1:11434"
Ollama server address to query for running models
OLLAMA_KEEP_ALIVE
duration
default:"5m"
Server-wide default keep-alive time (affects UNTIL column)
OLLAMA_MAX_LOADED_MODELS
integer
default:"1"
Maximum number of models that can be loaded simultaneously

Exit Codes

  • 0 - Success
  • 1 - Error (server not running, etc.)

Troubleshooting

Server Not Running

Error: could not connect to ollama server
Solution: Start the Ollama server:
ollama serve

No Models Showing

If no models appear, none are currently loaded. Run a model to load it:
ollama run llama3.2 "Hello"
ollama ps

Model Shows “Stopping…”

The model is being unloaded. This is normal when:
  • Keep-alive timer expired
  • You ran ollama stop MODEL
  • Server is shutting down
  • Another model needs the memory

Comparison with Other Commands

CommandPurpose
ollama psShow running models in memory
ollama listShow all available models (not just running)
ollama show MODELShow detailed info about one model