Commit Graph

31 Commits

Author SHA1 Message Date
Devon Rifkin 47991940d4 add qwen3-coder tool support
The format qwen3-coder uses is relatively unique, both in rendering and
in parsing. To implement parsing, I wrote a custom parser in similar
style to harmony. For the rendering, I found that the logic would be
much more difficult to follow in a template, so I introduced the concept
of a built-in renderer that uses go code, rather than a template to
generate prompts.

I set us up for future built-in parsers and renderers by making it so
they can be specified in a Modelfile like so:

```
RENDERER "qwen3-coder"
PARSER "qwen3-coder"
```

These need to be provided explicitly because the architecture alone is
not enough to understand what format the model expects to receive, and
what format we expect it to output (e.g., qwen3-coder is `qwen3moe`,
which includes other qwen3-family models as well)

I haven't converted harmony to be one of these "built-ins" yet, since
some of it is in flux with the changes @ParthSareen has been making to
move harmony to the runner. It is likely that many other built-ins will
need to move to the runner as well, but I'm able to slightly defer that
decision since qwen3-coder doesn't have thinking (and therefore doesn't
need to be in the runner to make structured outputs work). I expect to
unify harmony with this approach very soon.

Whether a particular model supports tools or thinking was previously
inferred from templates, but without a template we now also use the
parser itself to declare what it supports. If we have future models that
re-use the same parsing format, but have different capabilities, we'll
want to parameterize them and give them different names to be specified
as a `PARSER`.

Misc changes:

- I worked on the renderer by diffing outputs from the reference
  implementation and ours. To make it easier to do this, I extended
  <https://github.com/ollama/ollama/pull/11875> to also support
  returning the prompt via the openai compat layer
2025-09-15 11:33:47 -07:00
Jeffrey Morgan fa9973cd7f
api: remove unused sampling parameters (#10581) 2025-05-08 08:31:08 -07:00
Daniel Hiltgen 424810450f
Move quantization to new backend (#10363)
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
2025-05-06 11:20:48 -07:00
Jeffrey Morgan 3b2d2c8326
api: remove unused or unsupported api options (#10574)
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
2025-05-05 14:54:40 -07:00
Parth Sareen 00ebda8cc4
Revert "parser: remove role validation from Modelfile parser" (#9917)
This reverts commit ffbfe833da.
2025-03-21 12:38:09 -07:00
rylativity ffbfe833da
parser: remove role validation from Modelfile parser (#9874)
* updates parser/parser.go to allow arbitrary roles in Modelfile MESSAGE blocks
2025-03-20 13:11:17 -07:00
Michael Yang 58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
frob 294b6f5a22
docs: remove tfs_z option from documentation (#8515) 2025-01-21 09:28:59 -08:00
Jeffrey Morgan 42cf4db601
parser: fix parsing Modelfiles with multiple FROM commands (#8449) 2025-01-16 00:14:04 -08:00
Patrick Devine 32bd37adf8
make the modelfile path relative for `ollama create` (#8380) 2025-01-10 16:14:08 -08:00
Patrick Devine 86a622cbdc
Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
Stefan Weil abfdc4710f
all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
Patrick Devine 4efb98cb4f
add line numbers for parser errors (#7326) 2024-11-14 13:59:44 -08:00
Jesse Gross a909417602 runner.go: Remove unused arguments
Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.
2024-11-06 13:32:18 -08:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Tibor Schmidt f3d7a481b7
feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
Josh Yan 7e571f95f0 trimspace test case 2024-07-01 11:07:48 -07:00
Josh Yan 26e4e66faf updated parsefile test 2024-07-01 09:43:49 -07:00
Josh Yan 9bd00041fa trim all params 2024-06-27 11:18:38 -07:00
Josh Yan 4e986a823c unquote, trimp space 2024-06-27 10:59:15 -07:00
Michael Yang cd234ce22c parser: add test for multibyte runes 2024-06-13 13:07:42 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Patrick Devine ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
Michael Yang 119589fcb3 rename parser to model/file 2024-05-01 09:53:50 -07:00
Michael Yang bd8eed57fc fix parser name 2024-05-01 09:52:54 -07:00
Michael Yang 176ad3aa6e parser: add commands format 2024-05-01 09:52:54 -07:00
Michael Yang 8907bf51d2 fix multiline 2024-05-01 09:52:54 -07:00
Michael Yang abe614c705 tests 2024-05-01 09:52:54 -07:00
Michael Yang c0a00f68ae refactor modelfile parser 2024-05-01 09:52:54 -07:00
Patrick Devine 7c40a67841
Save and load sessions (#2063) 2024-01-25 12:12:36 -08:00
Patrick Devine 238ac5e765
Add unit tests for Parser (#1815) 2024-01-05 14:04:31 -08:00