NVIDIA GPUs
Ollama supports NVIDIA GPUs with compute capability 5.0+ and driver version 531 or newer.Compatibility
Check your GPU’s compute capability at developer.nvidia.com/cuda-gpusNVIDIA GPU Compatibility Table
NVIDIA GPU Compatibility Table
| Compute Capability | Family | Example Cards |
|---|---|---|
| 12.1 | NVIDIA | GB10 (DGX Spark) |
| 12.0 | GeForce RTX 50xx | RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060 Ti, RTX 5060 |
| 9.0 | NVIDIA | H200, H100 |
| 8.9 | GeForce RTX 40xx | RTX 4090, RTX 4080, RTX 4070 Ti, RTX 4070, RTX 4060 Ti, RTX 4060 |
| 8.6 | GeForce RTX 30xx | RTX 3090 Ti, RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070, RTX 3060 |
| 8.0 | NVIDIA Professional | A100, A30 |
| 7.5 | GeForce GTX/RTX | RTX 2080 Ti, RTX 2080, RTX 2070, RTX 2060, GTX 1650 Ti |
| 7.0 | NVIDIA | TITAN V, V100, Quadro GV100 |
| 6.1 | GeForce GTX | GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 Ti |
| 6.0 | NVIDIA | Tesla P100, Quadro GP100 |
| 5.2 | GeForce GTX | GTX 980 Ti, GTX 980, GTX 970, GTX 960 |
| 5.0 | GeForce GTX | GTX 750 Ti, GTX 750 |
GPU Selection
Limit Ollama to specific GPUs usingCUDA_VISIBLE_DEVICES:
Linux Suspend/Resume Issue
After a suspend/resume cycle, Ollama may fail to detect NVIDIA GPUs due to a driver bug. Workaround:AMD Radeon GPUs
Ollama supports AMD GPUs via ROCm (Linux and Windows) and Vulkan (experimental).Linux Support (ROCm)
AMD Radeon Consumer GPUs
AMD Radeon Consumer GPUs
- RX 7000 Series: 7900 XTX, 7900 XT, 7900 GRE, 7800 XT, 7700 XT, 7600 XT, 7600
- RX 6000 Series: 6950 XT, 6900 XTX, 6900 XT, 6800 XT, 6800
- Vega: Vega 64
AMD Radeon PRO
AMD Radeon PRO
W7900, W7800, W7700, W7600, W7500, W6900X, W6800X, W6800, V620, V420, V340, V320, Vega II
AMD Instinct (Data Center)
AMD Instinct (Data Center)
MI300X, MI300A, MI300, MI250X, MI250, MI210, MI200, MI100, MI60
Windows Support (ROCm v6.1)
Supported GPUs on Windows
Supported GPUs on Windows
- RX 7000 Series: 7900 XTX, 7900 XT, 7900 GRE, 7800 XT, 7700 XT, 7600 XT, 7600
- RX 6000 Series: 6950 XT, 6900 XTX, 6900 XT, 6800 XT, 6800
- Radeon PRO: W7900, W7800, W7700, W7600, W7500, W6900X, W6800X, W6800, V620
GPU Selection
Limit Ollama to specific AMD GPUs:rocminfo.
Force CPU usage:
Overriding GPU Targets (Linux)
Force unsupported AMD GPUs to use a compatible LLVM target:Supported LLVM Targets
| LLVM Target | Example GPU |
|---|---|
| gfx908 | Radeon Instinct MI100 |
| gfx90a | Radeon Instinct MI210 |
| gfx940 | Radeon Instinct MI300 |
| gfx1030 | Radeon PRO V620 |
| gfx1100 | Radeon PRO W7900 |
| gfx1101 | Radeon PRO W7700 |
| gfx1102 | Radeon RX 7600 |
Container Permissions (Linux)
On some distributions, SELinux prevents containers from accessing GPU devices:Apple Metal (macOS)
Ollama automatically uses Metal API for GPU acceleration on Apple Silicon (M1, M2, M3, M4) and Intel Macs with dedicated GPUs.Metal Support
No configuration needed - Metal acceleration is enabled by default on macOS.
Vulkan GPU Support
Vulkan provides additional GPU support on Windows and Linux, especially for GPUs not covered by CUDA or ROCm.Enabling Vulkan
Set the environment variable for the Ollama server:Installation
Windows: Most GPU drivers include Vulkan support by default. Linux: Install Vulkan components:- Intel GPUs: Intel GPU Driver Instructions
- AMD GPUs: AMD Vulkan Installation
ollama user to the render group:
GPU Selection
Select specific Vulkan GPUs:VRAM Reporting
For optimal scheduling, grant thecap_perfmon capability:
GPU Memory Management
Reserve VRAM Per GPU
Reserve a portion of VRAM to prevent GPU memory exhaustion:Number of bytes to reserve per GPU (default: 0)
Multi-GPU Scheduling
By default, Ollama loads models on a single GPU. Enable spreading across GPUs:This allows scheduling model layers across all available GPUs for larger models.
Troubleshooting
No GPU Detected
Check drivers
Ensure the latest GPU drivers are installed:
- NVIDIA: Driver version 531+
- AMD: Latest ROCm or AMDGPU driver
Performance Issues
Model too large for VRAM
Model too large for VRAM
Try a smaller or quantized model:
Multiple applications using GPU
Multiple applications using GPU
Close other GPU-intensive applications or adjust
OLLAMA_GPU_OVERHEAD.Thermal throttling
Thermal throttling
Monitor GPU temperatures and improve cooling if needed.
Related
Environment Variables
Configure GPU selection and memory settings
Model Quantization
Reduce model memory usage with quantization