ollama/llm
Jesse Gross 073fa31df5 llm: Don't always evict models in CPU-only mode
With old memory estimates, it's currently impossible to load more
than one model at a time when no GPUs are available. This is because
the check for whether we need to evict a model looks to see if all
layers of the new model can be loaded onto GPUs, which is never true
if there are no GPUs. Before the memory management changes, there
was a special code path for CPU-only systems.

This problem does not exist with new memory estimates.

Fixes #11974
2025-08-20 14:31:02 -07:00
..
llm_darwin.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_linux.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_windows.go win: lint fix (#10571) 2025-05-05 11:08:12 -07:00
memory.go llm: Don't always evict models in CPU-only mode 2025-08-20 14:31:02 -07:00
memory_test.go llm: New memory management 2025-08-14 15:24:01 -07:00
server.go llm: Don't always evict models in CPU-only mode 2025-08-20 14:31:02 -07:00
server_test.go llm: New memory management 2025-08-14 15:24:01 -07:00
status.go Improve crash reporting (#7728) 2024-11-19 16:26:57 -08:00