ollama/ml
Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory
The GGML CUDA backend allocates additional memory for intermediate
results during calculation. This memory isn't currently allocated
during worst case graph reservation and therefore not included in
scheduling. This means that as these buffers potentially grow
with context length, we could crash.

This extends the memory allocation system down layer from the GGML
graph to the CUDA layer, preallocating the worst case memory there
as well.

Fixes #11753
2025-09-30 15:04:43 -07:00
..
backend ggml: Preallocate CUDA pool memory 2025-09-30 15:04:43 -07:00
nn use split activations when possible (#12293) 2025-09-16 09:51:19 -07:00
backend.go ggml: Remove allocation status reporting 2025-09-30 15:04:43 -07:00