Synopsis
Description
Theps command displays all models currently loaded in memory. It shows:
- Model names
- Model IDs (digest)
- Memory usage
- Processor allocation (CPU/GPU split)
- Context window size
- Time until automatic unload
Arguments
Optional prefix to filter models. Shows only running models whose names start with this prefix.
Options
Theps command has no flags or options.
Examples
List All Running Models
Show all currently loaded models:Filter by Prefix
Show only running models starting with “llama”:No Models Running
When no models are loaded:Output Format
The output is formatted as a table with these columns:Full model name including tag (e.g.,
llama3.2:latest)First 12 characters of the model’s SHA256 digest
Total memory used by the model (including model weights, KV cache, etc.)
Where the model is running:
100% GPU- Fully on GPU100% CPU- Fully on CPU45%/55% CPU/GPU- Split between CPU and GPUUnknown- Cannot determine allocation
Current context window size in tokens
Time until the model will be automatically unloaded from memory:
- “4 minutes”
- “10 seconds”
- “Never” - Configured to stay loaded indefinitely
- “Stopping…” - Currently being unloaded
Understanding Processor Allocation
ThePROCESSOR column shows how model layers are distributed:
100% GPU
100% CPU
Split CPU/GPU
- GPU VRAM is insufficient for the entire model
- Multiple models are loaded simultaneously
- GPU is being shared with other applications
Unknown
Model Lifecycle
Models go through these states:- Not loaded: Model exists but is not in memory
- Loading: Model is being loaded into memory
- Running: Model is loaded and ready (shown by
ollama ps) - Stopping: Model is being unloaded (shows “Stopping…”)
- Unloaded: Model removed from memory
Keep-Alive Timer
TheUNTIL column shows when a model will be automatically unloaded:
- Timer resets each time the model is used
- Default is typically 5 minutes of inactivity
- Can be configured per-request or server-wide
- “Never” means the model stays loaded until manually stopped
Memory Management
Monitor total memory usage:Scripting Usage
Use in scripts and automation:Common Patterns
Find GPU-Only Models
Find models running entirely on GPU:Find Large Memory Users
Find models using more than 5GB:Monitor Unload Times
Watch models about to be unloaded:Performance Considerations
GPU vs CPU Performance
| Allocation | Typical Speed | Use Case |
|---|---|---|
| 100% GPU | 10-50 tok/s | Best for most workloads |
| CPU/GPU Split | 5-20 tok/s | Large models, limited VRAM |
| 100% CPU | 1-5 tok/s | No GPU, or GPU fully utilized |
Multiple Models
When running multiple models:- Total memory is sum of all models
- Each model competes for resources
- Consider adjusting
OLLAMA_MAX_LOADED_MODELS
Environment Variables
Ollama server address to query for running models
Server-wide default keep-alive time (affects UNTIL column)
Maximum number of models that can be loaded simultaneously
Exit Codes
0- Success1- Error (server not running, etc.)
Troubleshooting
Server Not Running
No Models Showing
If no models appear, none are currently loaded. Run a model to load it:Model Shows “Stopping…”
The model is being unloaded. This is normal when:- Keep-alive timer expired
- You ran
ollama stop MODEL - Server is shutting down
- Another model needs the memory
Comparison with Other Commands
| Command | Purpose |
|---|---|
ollama ps | Show running models in memory |
ollama list | Show all available models (not just running) |
ollama show MODEL | Show detailed info about one model |
Related Commands
ollama list- List all available modelsollama stop- Stop a running modelollama run- Load and run a modelollama serve- Configure server memory settings