mirror of https://github.com/ollama/ollama.git
				
				
				
			| 
				
					
						
							 | 
			||
|---|---|---|
| .. | ||
| README.md | ||
| cpu.yaml | ||
| gpu.yaml | ||
		
			
				
				README.md
			
		
		
			
			
		
	
	Deploy Ollama to Kubernetes
Prerequisites
- Ollama: https://ollama.com/download
 - Kubernetes cluster. This example will use Google Kubernetes Engine.
 
Steps
- 
Create the Ollama namespace, daemon set, and service
kubectl apply -f cpu.yaml - 
Port forward the Ollama service to connect and use it locally
kubectl -n ollama port-forward service/ollama 11434:80 - 
Pull and run a model, for example
orca-mini:3bollama run orca-mini:3b 
(Optional) Hardware Acceleration
Hardware acceleration in Kubernetes requires NVIDIA's k8s-device-plugin. Follow the link for more details.
Once configured, create a GPU enabled Ollama deployment.
kubectl apply -f gpu.yaml