ollama

History

Jesse Gross 3d0b1734c0 ggml: Preallocate CUDA pool memory The GGML CUDA backend allocates additional memory for intermediate results during calculation. This memory isn't currently allocated during worst case graph reservation and therefore not included in scheduling. This means that as these buffers potentially grow with context length, we could crash. This extends the memory allocation system down layer from the GGML graph to the CUDA layer, preallocating the worst case memory there as well. Fixes #11753		2025-09-30 15:04:43 -07:00
..
backend	ggml: Preallocate CUDA pool memory	2025-09-30 15:04:43 -07:00
nn	use split activations when possible (#12293 )	2025-09-16 09:51:19 -07:00
backend.go	ggml: Remove allocation status reporting	2025-09-30 15:04:43 -07:00