Skip to main content
Ollama supports GPU acceleration across multiple vendors, significantly improving model inference performance.

NVIDIA GPUs

Ollama supports NVIDIA GPUs with compute capability 5.0+ and driver version 531 or newer.

Compatibility

Check your GPU’s compute capability at developer.nvidia.com/cuda-gpus
Compute CapabilityFamilyExample Cards
12.1NVIDIAGB10 (DGX Spark)
12.0GeForce RTX 50xxRTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060 Ti, RTX 5060
9.0NVIDIAH200, H100
8.9GeForce RTX 40xxRTX 4090, RTX 4080, RTX 4070 Ti, RTX 4070, RTX 4060 Ti, RTX 4060
8.6GeForce RTX 30xxRTX 3090 Ti, RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070, RTX 3060
8.0NVIDIA ProfessionalA100, A30
7.5GeForce GTX/RTXRTX 2080 Ti, RTX 2080, RTX 2070, RTX 2060, GTX 1650 Ti
7.0NVIDIATITAN V, V100, Quadro GV100
6.1GeForce GTXGTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 Ti
6.0NVIDIATesla P100, Quadro GP100
5.2GeForce GTXGTX 980 Ti, GTX 980, GTX 970, GTX 960
5.0GeForce GTXGTX 750 Ti, GTX 750

GPU Selection

Limit Ollama to specific GPUs using CUDA_VISIBLE_DEVICES:
# Using GPU IDs
export CUDA_VISIBLE_DEVICES=0,1

# Using UUIDs (more reliable)
export CUDA_VISIBLE_DEVICES=GPU-abc123,GPU-def456
Discover GPU UUIDs with nvidia-smi -L. UUIDs are more reliable than numeric IDs as ordering may vary.
Force CPU usage:
export CUDA_VISIBLE_DEVICES=-1

Linux Suspend/Resume Issue

After a suspend/resume cycle, Ollama may fail to detect NVIDIA GPUs due to a driver bug. Workaround:
sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm

AMD Radeon GPUs

Ollama supports AMD GPUs via ROCm (Linux and Windows) and Vulkan (experimental).

Linux Support (ROCm)

  • RX 7000 Series: 7900 XTX, 7900 XT, 7900 GRE, 7800 XT, 7700 XT, 7600 XT, 7600
  • RX 6000 Series: 6950 XT, 6900 XTX, 6900 XT, 6800 XT, 6800
  • Vega: Vega 64
W7900, W7800, W7700, W7600, W7500, W6900X, W6800X, W6800, V620, V420, V340, V320, Vega II
MI300X, MI300A, MI300, MI250X, MI250, MI210, MI200, MI100, MI60

Windows Support (ROCm v6.1)

  • RX 7000 Series: 7900 XTX, 7900 XT, 7900 GRE, 7800 XT, 7700 XT, 7600 XT, 7600
  • RX 6000 Series: 6950 XT, 6900 XTX, 6900 XT, 6800 XT, 6800
  • Radeon PRO: W7900, W7800, W7700, W7600, W7500, W6900X, W6800X, W6800, V620

GPU Selection

Limit Ollama to specific AMD GPUs:
# Using numeric IDs
export ROCR_VISIBLE_DEVICES=0,1

# Using UUIDs (recommended)
export ROCR_VISIBLE_DEVICES=<uuid1>,<uuid2>
Discover device UUIDs with rocminfo. Force CPU usage:
export ROCR_VISIBLE_DEVICES=-1

Overriding GPU Targets (Linux)

Force unsupported AMD GPUs to use a compatible LLVM target:
# Example: Force RX 5400 (gfx1034) to use gfx1030
export HSA_OVERRIDE_GFX_VERSION="10.3.0"
Multiple GPUs with different versions:
export HSA_OVERRIDE_GFX_VERSION_0=10.3.0
export HSA_OVERRIDE_GFX_VERSION_1=11.0.0

Supported LLVM Targets

LLVM TargetExample GPU
gfx908Radeon Instinct MI100
gfx90aRadeon Instinct MI210
gfx940Radeon Instinct MI300
gfx1030Radeon PRO V620
gfx1100Radeon PRO W7900
gfx1101Radeon PRO W7700
gfx1102Radeon RX 7600
Overriding GPU targets is experimental. Test thoroughly before production use.

Container Permissions (Linux)

On some distributions, SELinux prevents containers from accessing GPU devices:
sudo setsebool container_use_devices=1

Apple Metal (macOS)

Ollama automatically uses Metal API for GPU acceleration on Apple Silicon (M1, M2, M3, M4) and Intel Macs with dedicated GPUs.

Metal Support

No configuration needed - Metal acceleration is enabled by default on macOS.

Vulkan GPU Support

Vulkan support is currently experimental. Enable by setting OLLAMA_VULKAN=1.
Vulkan provides additional GPU support on Windows and Linux, especially for GPUs not covered by CUDA or ROCm.

Enabling Vulkan

Set the environment variable for the Ollama server:
export OLLAMA_VULKAN=1
See Environment Variables for how to configure the Ollama server.

Installation

Windows: Most GPU drivers include Vulkan support by default. Linux: Install Vulkan components: AMD GPUs on Linux: Add the ollama user to the render group:
sudo usermod -aG render ollama

GPU Selection

Select specific Vulkan GPUs:
export GGML_VK_VISIBLE_DEVICES=0,1
Disable Vulkan:
export GGML_VK_VISIBLE_DEVICES=-1

VRAM Reporting

For optimal scheduling, grant the cap_perfmon capability:
sudo setcap cap_perfmon+ep /usr/local/bin/ollama
Without this capability, Ollama uses approximate model sizes for scheduling decisions.

GPU Memory Management

Reserve VRAM Per GPU

Reserve a portion of VRAM to prevent GPU memory exhaustion:
# Reserve 2GB per GPU
export OLLAMA_GPU_OVERHEAD=2147483648
OLLAMA_GPU_OVERHEAD
integer
Number of bytes to reserve per GPU (default: 0)

Multi-GPU Scheduling

By default, Ollama loads models on a single GPU. Enable spreading across GPUs:
export OLLAMA_SCHED_SPREAD=true
This allows scheduling model layers across all available GPUs for larger models.

Troubleshooting

No GPU Detected

1

Check drivers

Ensure the latest GPU drivers are installed:
  • NVIDIA: Driver version 531+
  • AMD: Latest ROCm or AMDGPU driver
2

Verify GPU visibility

Check if the GPU is visible to the system:
# NVIDIA
nvidia-smi

# AMD
rocminfo
3

Check environment variables

Ensure CUDA_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES aren’t hiding GPUs.

Performance Issues

Try a smaller or quantized model:
# Use a smaller model
ollama run llama3.2:1b

# Or use a quantized version
ollama run llama3.2:q4_K_M
Close other GPU-intensive applications or adjust OLLAMA_GPU_OVERHEAD.
Monitor GPU temperatures and improve cooling if needed.

Environment Variables

Configure GPU selection and memory settings

Model Quantization

Reduce model memory usage with quantization