ollama/fs/ggml
Jesse Gross f560bd077f llm: Use Ollama engine memory layouts for both old and new engines
Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.
2025-11-11 13:11:08 -08:00
..
ggml.go llm: Use Ollama engine memory layouts for both old and new engines 2025-11-11 13:11:08 -08:00
ggml_test.go ggml: fix crash for array head counts 2025-04-27 11:38:06 -07:00
gguf.go fs(ggml): fill in arch prefix if necessary (#12646) 2025-10-20 16:42:18 -07:00
gguf_test.go fs(ggml): fill in arch prefix if necessary (#12646) 2025-10-20 16:42:18 -07:00
type.go fs/ggml: fix function name in comment (#12630) 2025-10-15 21:53:38 -07:00