mirror of https://github.com/ollama/ollama.git
Currently for both the old and new engines, there is code to calculate how much memory is required for a model and lay out the layers onto GPUs. This reuses the new engine's lay out code for the old engine as well, bringing them closer together. The old engine continues to use its current method of estimating required memory. This reduces maintainence effort and improves consistency, as new features only need to be implemented in one place. The newer code is also more accurate, especially with multiple GPUs. |
||
|---|---|---|
| .. | ||
| ggml.go | ||
| ggml_test.go | ||
| gguf.go | ||
| gguf_test.go | ||
| type.go | ||