ollama

Commit Graph

Author	SHA1	Message	Date
Daniel Hiltgen	c68f367ef6	Update GGML to b6646 (#12245 ) Notable EOLs with this change: - MacOS v12 and v13 are no longer supported (v14+ required) - AMD gfx900 and gfx906 are no longer supported	2025-10-02 14:47:10 -07:00
Jesse Gross	9d97e6a9f1	ggml: Avoid allocating CUDA primary context on unused GPUs The recent memory management changes caused all GPUs to be visible to the runner, regardless of whether they are ultimately used. This caused CUDA devices to allocate a primary context (~300 MB VRAM) on each GPU, for each model. This is unnecessary, so we can both avoid touching GPUs that we exclude in the early stage of allocation and freeing the memory for any that we touch but don't use. The issue will continue to exist for the old engine, since it touches all devices during initialization.	2025-08-27 16:24:18 -07:00

Author

SHA1

Message

Date

Daniel Hiltgen

c68f367ef6

Update GGML to b6646 (#12245 )

Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported

2025-10-02 14:47:10 -07:00

Jesse Gross

9d97e6a9f1

ggml: Avoid allocating CUDA primary context on unused GPUs

The recent memory management changes caused all GPUs to be visible
to the runner, regardless of whether they are ultimately used. This
caused CUDA devices to allocate a primary context (~300 MB VRAM) on
each GPU, for each model. This is unnecessary, so we can both avoid
touching GPUs that we exclude in the early stage of allocation and
freeing the memory for any that we touch but don't use.

The issue will continue to exist for the old engine, since it touches
all devices during initialization.

2025-08-27 16:24:18 -07:00

2 Commits