How can I upgrade Ollama?
How can I upgrade Ollama?
How can I view the logs?
How can I view the logs?
- macOS:
cat ~/.ollama/logs/server.log - Linux:
journalctl -u ollama --no-pager --follow --pager-end - Docker:
docker logs <container-name> - Windows:
explorer %LOCALAPPDATA%\Ollama
Is my GPU compatible with Ollama?
Is my GPU compatible with Ollama?
- NVIDIA GPUs (CUDA)
- AMD GPUs (ROCm)
- Apple Silicon (Metal)
- Intel/AMD GPUs (Vulkan)
How can I specify the context window size?
How can I specify the context window size?
How can I tell if my model was loaded onto the GPU?
How can I tell if my model was loaded onto the GPU?
ollama ps command to see what models are currently loaded into memory:Processor column shows where the model was loaded:100% GPU- Model loaded entirely into the GPU100% CPU- Model loaded entirely in system memory48%/52% CPU/GPU- Model loaded partially onto both GPU and system memory
How do I configure Ollama server?
How do I configure Ollama server?
macOS
If Ollama is run as a macOS application, environment variables should be set usinglaunchctl:Linux
If Ollama is run as a systemd service, environment variables should be set usingsystemctl:Add environment variables
Environment under section [Service]:Windows
On Windows, Ollama inherits your user and system environment variables.Open environment variables
Edit variables
OLLAMA_HOST, OLLAMA_MODELS, etc.How do I use Ollama behind a proxy?
How do I use Ollama behind a proxy?
HTTPS_PROXY to redirect outbound requests through the proxy. Ensure the proxy certificate is installed as a system certificate.HTTP_PROXY. Ollama does not use HTTP for model pulls, only HTTPS. Setting HTTP_PROXY may interrupt client connections to the server.Using Docker
The Ollama Docker container can be configured to use a proxy:- Docker Desktop on macOS
- Docker Desktop on Windows
- Docker Desktop on Linux
- Docker daemon with systemd
Using self-signed certificates
Ensure the certificate is installed as a system certificate when using HTTPS. This may require a new Docker image:Does Ollama send my prompts and answers back to ollama.com?
Does Ollama send my prompts and answers back to ollama.com?
How do I disable Ollama's cloud features?
How do I disable Ollama's cloud features?
Using configuration file
Setdisable_ollama_cloud in ~/.ollama/server.json:Using environment variable
Set theOLLAMA_NO_CLOUD environment variable:Ollama cloud disabled: true.How can I expose Ollama on my network?
How can I expose Ollama on my network?
127.0.0.1:11434 by default. Change the bind address with the OLLAMA_HOST environment variable.How can I use Ollama with a proxy server like Nginx?
How can I use Ollama with a proxy server like Nginx?
How can I allow additional web origins to access Ollama?
How can I allow additional web origins to access Ollama?
127.0.0.1 and 0.0.0.0 by default. Additional origins can be configured with OLLAMA_ORIGINS.Browser extensions
For browser extensions, you’ll need to explicitly allow the extension’s origin pattern:Where are models stored?
Where are models stored?
- macOS:
~/.ollama/models - Linux:
/usr/share/ollama/.ollama/models - Windows:
C:\Users\%username%\.ollama\models
Changing the location
Set theOLLAMA_MODELS environment variable to specify a different directory.ollama user needs read and write access to the specified directory. To assign the directory to the ollama user run:How can I use Ollama in Visual Studio Code?
How can I use Ollama in Visual Studio Code?
How do I use Ollama with GPU acceleration in Docker?
How do I use Ollama with GPU acceleration in Docker?
Why is networking slow in WSL2 on Windows 10?
Why is networking slow in WSL2 on Windows 10?
Open Network Settings
Control Panel > Networking and Internet > View network status and tasks and click on Change adapter settings on the left panel.How can I preload a model to get faster response times?
How can I preload a model to get faster response times?
How do I keep a model loaded in memory or make it unload immediately?
How do I keep a model loaded in memory or make it unload immediately?
Unload immediately
Use theollama stop command:Using the API
Use thekeep_alive parameter with the /api/generate and /api/chat endpoints.The keep_alive parameter can be set to:- A duration string (such as
"10m"or"24h") - A number in seconds (such as
3600) - Any negative number which will keep the model loaded in memory (e.g.
-1or"-1m") 0which will unload the model immediately after generating a response
Using environment variable
Change the default for all models by setting theOLLAMA_KEEP_ALIVE environment variable when starting the Ollama server.keep_alive API parameter will override the OLLAMA_KEEP_ALIVE setting.How do I manage the maximum number of requests the server can queue?
How do I manage the maximum number of requests the server can queue?
OLLAMA_MAX_QUEUE.How does Ollama handle concurrent requests?
How does Ollama handle concurrent requests?
- Multiple models can be loaded at the same time if there’s sufficient available memory
- Parallel request processing for a given model if there’s sufficient memory
Configuration
The following server settings control concurrent request handling:OLLAMA_MAX_LOADED_MODELS- Maximum number of models that can be loaded concurrently (default: 3 × number of GPUs or 3 for CPU inference)OLLAMA_NUM_PARALLEL- Maximum number of parallel requests each model will process (default: 1)OLLAMA_MAX_QUEUE- Maximum number of requests Ollama will queue when busy (default: 512)
How does Ollama load models on multiple GPUs?
How does Ollama load models on multiple GPUs?
- If the model will entirely fit on any single GPU, Ollama will load the model on that GPU (provides best performance)
- If the model does not fit entirely on one GPU, it will be spread across all available GPUs
How can I enable Flash Attention?
How can I enable Flash Attention?
OLLAMA_FLASH_ATTENTION environment variable:How can I set the quantization type for the K/V cache?
How can I set the quantization type for the K/V cache?
OLLAMA_KV_CACHE_TYPE environment variable:Available quantization types
f16- High precision and memory usage (default)q8_0- 8-bit quantization, uses ~1/2 the memory off16with minimal quality loss (recommended)q4_0- 4-bit quantization, uses ~1/4 the memory off16with small-medium quality loss
Where can I find my Ollama Public Key?
Where can I find my Ollama Public Key?
- Push models to Ollama
- Pull private models from Ollama to your machine
- Run models hosted in Ollama Cloud
How to add the key
Sign in via the Settings page in the Mac and Windows AppSign in via CLI:Key locations
| OS | Path to id_ed25519.pub |
|---|---|
| macOS | ~/.ollama/id_ed25519.pub |
| Linux | /usr/share/ollama/.ollama/id_ed25519.pub |
| Windows | C:\Users\<username>\.ollama\id_ed25519.pub |
How can I stop Ollama from starting when I login to my computer?
How can I stop Ollama from starting when I login to my computer?
Windows
In Task Manager go to the Startup apps tab, search forollama then click Disable.macOS
Open Settings and search for “Login Items”, find theOllama entry under Allow in the Background, then click the slider to disable.