GPU Configuration

Ollama supports GPU acceleration across multiple vendors, significantly improving model inference performance.

NVIDIA GPUs

Ollama supports NVIDIA GPUs with compute capability 5.0+ and driver version 531 or newer.

Compatibility

Check your GPU’s compute capability at developer.nvidia.com/cuda-gpus

NVIDIA GPU Compatibility Table

Compute Capability	Family	Example Cards
12.1	NVIDIA	GB10 (DGX Spark)
12.0	GeForce RTX 50xx	RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060 Ti, RTX 5060
9.0	NVIDIA	H200, H100
8.9	GeForce RTX 40xx	RTX 4090, RTX 4080, RTX 4070 Ti, RTX 4070, RTX 4060 Ti, RTX 4060
8.6	GeForce RTX 30xx	RTX 3090 Ti, RTX 3090, RTX 3080 Ti, RTX 3080, RTX 3070, RTX 3060
8.0	NVIDIA Professional	A100, A30
7.5	GeForce GTX/RTX	RTX 2080 Ti, RTX 2080, RTX 2070, RTX 2060, GTX 1650 Ti
7.0	NVIDIA	TITAN V, V100, Quadro GV100
6.1	GeForce GTX	GTX 1080 Ti, GTX 1080, GTX 1070, GTX 1060, GTX 1050 Ti
6.0	NVIDIA	Tesla P100, Quadro GP100
5.2	GeForce GTX	GTX 980 Ti, GTX 980, GTX 970, GTX 960
5.0	GeForce GTX	GTX 750 Ti, GTX 750

GPU Selection

Limit Ollama to specific GPUs using CUDA_VISIBLE_DEVICES:

# Using GPU IDs
export CUDA_VISIBLE_DEVICES=0,1

# Using UUIDs (more reliable)
export CUDA_VISIBLE_DEVICES=GPU-abc123,GPU-def456

Discover GPU UUIDs with nvidia-smi -L. UUIDs are more reliable than numeric IDs as ordering may vary.

Force CPU usage:

export CUDA_VISIBLE_DEVICES=-1

Linux Suspend/Resume Issue

After a suspend/resume cycle, Ollama may fail to detect NVIDIA GPUs due to a driver bug. Workaround:

sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm

AMD Radeon GPUs

Ollama supports AMD GPUs via ROCm (Linux and Windows) and Vulkan (experimental).

Linux Support (ROCm)

AMD Radeon Consumer GPUs

RX 7000 Series: 7900 XTX, 7900 XT, 7900 GRE, 7800 XT, 7700 XT, 7600 XT, 7600
RX 6000 Series: 6950 XT, 6900 XTX, 6900 XT, 6800 XT, 6800
Vega: Vega 64

AMD Radeon PRO

W7900, W7800, W7700, W7600, W7500, W6900X, W6800X, W6800, V620, V420, V340, V320, Vega II

AMD Instinct (Data Center)

MI300X, MI300A, MI300, MI250X, MI250, MI210, MI200, MI100, MI60

Windows Support (ROCm v6.1)

Supported GPUs on Windows

RX 7000 Series: 7900 XTX, 7900 XT, 7900 GRE, 7800 XT, 7700 XT, 7600 XT, 7600
RX 6000 Series: 6950 XT, 6900 XTX, 6900 XT, 6800 XT, 6800
Radeon PRO: W7900, W7800, W7700, W7600, W7500, W6900X, W6800X, W6800, V620

GPU Selection

Limit Ollama to specific AMD GPUs:

# Using numeric IDs
export ROCR_VISIBLE_DEVICES=0,1

# Using UUIDs (recommended)
export ROCR_VISIBLE_DEVICES=<uuid1>,<uuid2>

Discover device UUIDs with rocminfo. Force CPU usage:

export ROCR_VISIBLE_DEVICES=-1

Overriding GPU Targets (Linux)

Force unsupported AMD GPUs to use a compatible LLVM target:

# Example: Force RX 5400 (gfx1034) to use gfx1030
export HSA_OVERRIDE_GFX_VERSION="10.3.0"

Multiple GPUs with different versions:

export HSA_OVERRIDE_GFX_VERSION_0=10.3.0
export HSA_OVERRIDE_GFX_VERSION_1=11.0.0

Supported LLVM Targets

LLVM Target	Example GPU
gfx908	Radeon Instinct MI100
gfx90a	Radeon Instinct MI210
gfx940	Radeon Instinct MI300
gfx1030	Radeon PRO V620
gfx1100	Radeon PRO W7900
gfx1101	Radeon PRO W7700
gfx1102	Radeon RX 7600

Overriding GPU targets is experimental. Test thoroughly before production use.

Container Permissions (Linux)

On some distributions, SELinux prevents containers from accessing GPU devices:

sudo setsebool container_use_devices=1

Apple Metal (macOS)

Ollama automatically uses Metal API for GPU acceleration on Apple Silicon (M1, M2, M3, M4) and Intel Macs with dedicated GPUs.

Metal Support

No configuration needed - Metal acceleration is enabled by default on macOS.

Vulkan GPU Support

Vulkan support is currently experimental. Enable by setting OLLAMA_VULKAN=1.

Vulkan provides additional GPU support on Windows and Linux, especially for GPUs not covered by CUDA or ROCm.

Enabling Vulkan

Set the environment variable for the Ollama server:

export OLLAMA_VULKAN=1

See Environment Variables for how to configure the Ollama server.

Installation

Windows: Most GPU drivers include Vulkan support by default. Linux: Install Vulkan components:

Intel GPUs: Intel GPU Driver Instructions
AMD GPUs: AMD Vulkan Installation

AMD GPUs on Linux: Add the ollama user to the render group:

sudo usermod -aG render ollama

GPU Selection

Select specific Vulkan GPUs:

export GGML_VK_VISIBLE_DEVICES=0,1

Disable Vulkan:

export GGML_VK_VISIBLE_DEVICES=-1

VRAM Reporting

For optimal scheduling, grant the cap_perfmon capability:

sudo setcap cap_perfmon+ep /usr/local/bin/ollama

Without this capability, Ollama uses approximate model sizes for scheduling decisions.

GPU Memory Management

Reserve VRAM Per GPU

Reserve a portion of VRAM to prevent GPU memory exhaustion:

# Reserve 2GB per GPU
export OLLAMA_GPU_OVERHEAD=2147483648

OLLAMA_GPU_OVERHEAD

integer

Number of bytes to reserve per GPU (default: 0)

Multi-GPU Scheduling

By default, Ollama loads models on a single GPU. Enable spreading across GPUs:

export OLLAMA_SCHED_SPREAD=true

This allows scheduling model layers across all available GPUs for larger models.

Troubleshooting

No GPU Detected

Check drivers

Ensure the latest GPU drivers are installed:

NVIDIA: Driver version 531+
AMD: Latest ROCm or AMDGPU driver

Verify GPU visibility

Check if the GPU is visible to the system:

# NVIDIA
nvidia-smi

# AMD
rocminfo

Check environment variables

Ensure CUDA_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES aren’t hiding GPUs.

Performance Issues

Model too large for VRAM

Try a smaller or quantized model:

# Use a smaller model
ollama run llama3.2:1b

# Or use a quantized version
ollama run llama3.2:q4_K_M

Multiple applications using GPU

Close other GPU-intensive applications or adjust OLLAMA_GPU_OVERHEAD.

Thermal throttling

Monitor GPU temperatures and improve cooling if needed.

Environment Variables

Configure GPU selection and memory settings

Model Quantization

Reduce model memory usage with quantization

​NVIDIA GPUs

​Compatibility

​GPU Selection

​Linux Suspend/Resume Issue

​AMD Radeon GPUs

​Linux Support (ROCm)

​Windows Support (ROCm v6.1)

​GPU Selection

​Overriding GPU Targets (Linux)

​Supported LLVM Targets

​Container Permissions (Linux)

​Apple Metal (macOS)

Metal Support

​Vulkan GPU Support

​Enabling Vulkan

​Installation

​GPU Selection

​VRAM Reporting

​GPU Memory Management

​Reserve VRAM Per GPU

​Multi-GPU Scheduling

​Troubleshooting

​No GPU Detected

​Performance Issues

​Related

Environment Variables

Model Quantization

NVIDIA GPUs

Compatibility

GPU Selection

Linux Suspend/Resume Issue

AMD Radeon GPUs

Linux Support (ROCm)

Windows Support (ROCm v6.1)

GPU Selection

Overriding GPU Targets (Linux)

Supported LLVM Targets

Container Permissions (Linux)

Apple Metal (macOS)

Vulkan GPU Support

Enabling Vulkan

Installation

GPU Selection

VRAM Reporting

GPU Memory Management

Reserve VRAM Per GPU

Multi-GPU Scheduling

Troubleshooting

No GPU Detected

Performance Issues

Related