History

Michael Yang fa7776fd24 gpt-oss (#11672 ) * bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Devon Rifkin <drifkin@drifkin.net>		2025-08-05 12:21:16 -07:00
..
llama.cpp	chore: update mllama to use ollama engine (#10637 )	2025-05-13 17:36:02 -07:00
patches	gpt-oss (#11672 )	2025-08-05 12:21:16 -07:00
.gitignore	Re-introduce the `llama` package (#5034 )	2024-10-08 08:53:54 -07:00
README.md	docs: improve syntax highlighting in code blocks (#8854 )	2025-02-07 09:55:07 -08:00
build-info.cpp	llama: update to commit de4c07f93 (#10655 )	2025-05-12 12:17:26 -07:00
build-info.cpp.in	chore: update gitattributes (#8860 )	2025-02-05 16:37:18 -08:00
llama.go	llama: add minimum memory for grammar (#10820 )	2025-05-22 18:53:31 -07:00
llama_test.go	llama: move grammar tests to llama_test.go (#8411 )	2025-01-14 12:55:45 -08:00
sampling_ext.cpp	llama: fix memory leak for grammar (#10696 )	2025-05-13 15:39:27 -07:00
sampling_ext.h	api: remove unused sampling parameters (#10581 )	2025-05-08 08:31:08 -07:00

README.md

`llama`

This package provides Go bindings to llama.cpp.

Vendoring

Ollama vendors llama.cpp and ggml. While we generally strive to contribute changes back upstream to avoid drift, we carry a small set of patches which are applied to the tracking commit.

If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the ./vendor/ directory.

make -f Makefile.sync apply-patches

Updating Base Commit

Pin to new base commit

To change the base commit, update FETCH_HEAD in Makefile.sync.

When updating to a newer base commit, the existing patches may not apply cleanly and require manual merge resolution.

Start by applying the patches. If any of the patches have conflicts, the git am will stop at the first failure.

make -f Makefile.sync apply-patches

If there are conflicts, you will see an error message. Resolve the conflicts in ./vendor/, and continue the patch series with git am --continue and rerun make -f Makefile.sync apply-patches. Repeat until all patches are successfully applied.

Once all patches are applied, commit the changes to the tracking repository.

make -f Makefile.sync format-patches sync

Generating Patches

When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied:

make -f Makefile.sync clean apply-patches

Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the ./vendor/ directory, then generate the patches for ollama with

make -f Makefile.sync format-patches

In your ./vendor/ directory, create a branch, and cherry-pick the new commit to that branch, then submit a PR upstream to llama.cpp.

Commit the changes in the ollama repo and submit a PR to Ollama, which will include the vendored code update with your change, along with the patches.

After your PR upstream is merged, follow the Updating Base Commit instructions above, however first remove your patch before running apply-patches since the new base commit contains your change already.