ollama

History

Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory The GGML CUDA backend allocates additional memory for intermediate results during calculation. This memory isn't currently allocated during worst case graph reservation and therefore not included in scheduling. This means that as these buffers potentially grow with context length, we could crash. This extends the memory allocation system down layer from the GGML graph to the CUDA layer, preallocating the worst case memory there as well. Fixes #11753	2025-09-30 15:04:43 -07:00
..
ggml	ggml: Preallocate CUDA pool memory	2025-09-30 15:04:43 -07:00
backend.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00

Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory

The GGML CUDA backend allocates additional memory for intermediate
results during calculation. This memory isn't currently allocated
during worst case graph reservation and therefore not included in
scheduling. This means that as these buffers potentially grow
with context length, we could crash.

This extends the memory allocation system down layer from the GGML
graph to the CUDA layer, preallocating the worst case memory there
as well.

Fixes #11753

2025-09-30 15:04:43 -07:00

ggml

ggml: Preallocate CUDA pool memory

2025-09-30 15:04:43 -07:00

backend.go

next ollama runner (#7913 )

2025-02-13 16:31:21 -08:00