Commit Graph

63 Commits

Author SHA1 Message Date
Michael Yang 0dabb4ef6a
skip tokenizer.model if possible (#11050)
if tokenizer.json is already copied, skip tokenizer.model
2025-06-11 12:10:35 -07:00
Jeffrey Morgan fa9973cd7f
api: remove unused sampling parameters (#10581) 2025-05-08 08:31:08 -07:00
Daniel Hiltgen 424810450f
Move quantization to new backend (#10363)
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
2025-05-06 11:20:48 -07:00
Jeffrey Morgan 3b2d2c8326
api: remove unused or unsupported api options (#10574)
Some options listed in api/types.go are not supported in
newer models, or have been deprecated in the past. This is
the first of a series of PRs to clean up the API options
2025-05-05 14:54:40 -07:00
Michael Yang d931ee8f22
create blobs in parallel (#10135)
* default max term height
* error on out of tree files
2025-05-05 11:59:26 -07:00
Michael Yang 16fca86c4a digest files in parallel 2025-04-07 09:46:31 -07:00
Bruce MacDonald 6bd0a983cd model: support for mistral-small in the ollama runner
Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.
2025-04-03 16:57:36 -07:00
Parth Sareen 00ebda8cc4
Revert "parser: remove role validation from Modelfile parser" (#9917)
This reverts commit ffbfe833da.
2025-03-21 12:38:09 -07:00
rylativity ffbfe833da
parser: remove role validation from Modelfile parser (#9874)
* updates parser/parser.go to allow arbitrary roles in Modelfile MESSAGE blocks
2025-03-20 13:11:17 -07:00
Michael Yang 58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
frob 294b6f5a22
docs: remove tfs_z option from documentation (#8515) 2025-01-21 09:28:59 -08:00
Jeffrey Morgan 42cf4db601
parser: fix parsing Modelfiles with multiple FROM commands (#8449) 2025-01-16 00:14:04 -08:00
Patrick Devine 2539f2dbf9
Fix absolute path names + gguf detection (#8428) 2025-01-14 19:01:24 -08:00
Patrick Devine 32bd37adf8
make the modelfile path relative for `ollama create` (#8380) 2025-01-10 16:14:08 -08:00
Jeffrey Morgan 1deafd8254
llama: update vendored code to commit 46e3556 (#8308) 2025-01-08 11:22:01 -08:00
Patrick Devine 86a622cbdc
Update the /api/create endpoint to use JSON (#7935)
Replaces `POST /api/create` to use JSON instead of a Modelfile.

This is a breaking change.
2024-12-31 18:02:30 -08:00
Stefan Weil abfdc4710f
all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
Patrick Devine 4efb98cb4f
add line numbers for parser errors (#7326) 2024-11-14 13:59:44 -08:00
Jesse Gross a909417602 runner.go: Remove unused arguments
Now that server.cpp is gone, we don't need to keep passing arguments
that were only ignored and only kept for compatibility.
2024-11-06 13:32:18 -08:00
Michael Yang b732beba6a lint 2024-08-01 17:06:06 -07:00
Tibor Schmidt f3d7a481b7
feat: add support for min_p (resolve #1142) (#1825) 2024-07-27 14:37:40 -07:00
Josh Yan 7e571f95f0 trimspace test case 2024-07-01 11:07:48 -07:00
Josh Yan 26e4e66faf updated parsefile test 2024-07-01 09:43:49 -07:00
Josh Yan 9bd00041fa trim all params 2024-06-27 11:18:38 -07:00
Josh Yan 4e986a823c unquote, trimp space 2024-06-27 10:59:15 -07:00
Michael Yang d528e1af75 fix utf16 for multibyte runes 2024-06-13 13:07:42 -07:00
Michael Yang cd234ce22c parser: add test for multibyte runes 2024-06-13 13:07:42 -07:00
Michael Yang 20b9f8e6f4 Revert "proper utf16 support"
This reverts commit 66ab48772f.

this change broke utf-8 scanning of multi-byte runes
2024-06-13 10:22:16 -07:00
Michael Yang 66ab48772f proper utf16 support 2024-06-05 13:11:50 -07:00
Michael Yang e40145a39d lint 2024-06-04 11:13:30 -07:00
Patrick Devine ccdf0b2a44
Move the parser back + handle utf16 files (#4533) 2024-05-20 11:26:45 -07:00
Michael Yang 119589fcb3 rename parser to model/file 2024-05-01 09:53:50 -07:00
Michael Yang bd8eed57fc fix parser name 2024-05-01 09:52:54 -07:00
Michael Yang 9cf0f2e973 use parser.Format instead of templating modelfile 2024-05-01 09:52:54 -07:00
Michael Yang 176ad3aa6e parser: add commands format 2024-05-01 09:52:54 -07:00
Michael Yang 4d08363580 comments 2024-05-01 09:52:54 -07:00
Michael Yang 8907bf51d2 fix multiline 2024-05-01 09:52:54 -07:00
Michael Yang abe614c705 tests 2024-05-01 09:52:54 -07:00
Michael Yang 238715037d linting 2024-05-01 09:52:54 -07:00
Michael Yang c0a00f68ae refactor modelfile parser 2024-05-01 09:52:54 -07:00
Patrick Devine 7c40a67841
Save and load sessions (#2063) 2024-01-25 12:12:36 -08:00
Daniel Hiltgen fedd705aea Mechanical switch from log to slog
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Patrick Devine 238ac5e765
Add unit tests for Parser (#1815) 2024-01-05 14:04:31 -08:00
Michael Yang 38fe1a368b fix: trim space in modelfile fields 2023-12-05 11:57:29 -08:00
Bruce MacDonald a0c3e989de
deprecate modelfile embed command (#759) 2023-10-16 11:07:37 -04:00
Michael Yang 6517bcc53c
Merge pull request #290 from jmorganca/add-adapter-layers
implement loading ggml lora adapters through the modelfile
2023-08-10 17:23:01 -07:00
Michael Yang 21e6197c0b
Merge pull request #322 from jmorganca/no-comment-warning
no warning on comments
2023-08-10 16:24:41 -07:00
Michael Yang 20bf000e55 no warning on comments 2023-08-10 16:22:38 -07:00
Michael Yang 40d0c4a1dc length check for parameters 2023-08-10 16:09:02 -07:00
Michael Yang 6de5d032e1 implement loading ggml lora adapters through the modelfile 2023-08-10 09:23:39 -07:00