mirror of https://github.com/ollama/ollama.git
				
				
				
			| With old memory estimates, it's currently impossible to load more than one model at a time when no GPUs are available. This is because the check for whether we need to evict a model looks to see if all layers of the new model can be loaded onto GPUs, which is never true if there are no GPUs. Before the memory management changes, there was a special code path for CPU-only systems. This problem does not exist with new memory estimates. Fixes #11974 | ||
|---|---|---|
| .. | ||
| llm_darwin.go | ||
| llm_linux.go | ||
| llm_windows.go | ||
| memory.go | ||
| memory_test.go | ||
| server.go | ||
| server_test.go | ||
| status.go | ||