Commit Graph

17 Commits

Author SHA1 Message Date
Daniel Hiltgen 1fdb351c37
New engine: vision models and auto-fallback (#9113)
* Include unified vision layers in memory prediction

For newer vision models with a single gguf, include
the projection estimates.

* Adjust CLI to handle both styles of vision model metadata

* Wire up new tokenizers for new engine

If we're loading the new engine, utilize the new model
text processor instead of calling into cgo wrappers for
llama.cpp.  This also cleans up some tech debt from the
older tokenization flow for the C++ server which was
no longer used.

This also adjusts the grammar handling logic to pass
through to the new engine instead of utilizing the cgo
schema to grammar call.

* Lay foundation for auto selection of new engine
2025-03-04 09:03:46 -08:00
Jesse Gross ed443a0393 Runner for Ollama engine
This provides integration with the new Ollama engine
(5824541 next ollama runner (#7913)) and the rest of the Ollama
infrastructure such as the runner and Ollama server.

In addition, it also builds out the KV cache infrastructure to
support requirements of how Ollama runs models such as:
 - Parallel processing
 - Memory management for defragmentation and shifting
 - Multi-modal modals

Both old and new engines continue to be supported. By default, only
the old engine is used. To enable the new engine:

Start the server with the OLLAMA_NEW_ENGINE environment variable set:
OLLAMA_NEW_ENGINE=1 ./ollama serve

Start a model that is supported by the Ollama engine. This one is Llama 3.1 8b Q4_K_M:
./ollama run jessegross/llama3.1
2025-02-13 17:09:26 -08:00
Jesse Gross 6945617af5 models: Move model into their own directory
This allows there to be a file that is a list of models that is
not mixed into the runner code.
2025-02-13 17:09:26 -08:00
Patrick Devine 8c9fb8eb73
imageproc mllama refactor (#7537)
Refactor mllama image processing code, and add pixtral and qwen2vl
2024-12-14 19:50:15 -08:00
Jesse Gross 900f64e6be prompt: Don't trim whitespace from prompts
New lines can be an important part of a user's prompt and trimming
it can alter the results. We previously only trimmed prompts with
images but refactoring brought this behavior to all prompts, where
it became more noticable.

The /generate endpoint adds less whitespace and therefore doesn't
need to trim it out - this brings the same behavior to /chat.

Thanks to @gabe-l-hart for spotting the issue!

Fixes #7795
2024-12-09 11:02:55 -08:00
Jesse Gross 34a75102f7 prompt: Use a single token when estimating mllama context size
Currently we assume that images take 768 tokens of context size for
the purposes of clipping old messages that exceed the context window.
However, our mllama implementation stores the full image embedding
in a single token. As a result, there is significant waste of context
space.

Ideally, we would handle this more generically and have the
implementation report the number of tokens. However, at the moment
this would just result in a similar set of 'if' conditions in the
runner plus APIs to report it back. So for now, we just keep this
simple.
2024-11-05 10:11:50 -08:00
Jesse Gross c826e57475 runner.go: Better abstract vision model integration
-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-10-30 14:53:43 -07:00
Patrick Devine c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Michael Yang d02bbebb11 tools 2024-07-15 15:26:16 -07:00
Michael Yang 22c5451fc2
fix system prompt (#5662)
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
Michael Yang 2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang 269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Michael Yang 58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
Patrick Devine 1b272d5bcd
change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) 2024-03-26 13:04:17 -07:00
Michael Yang 0e19476b56
prepend image tags (#2789)
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
Bruce MacDonald 88622847c6
fix: chat system prompting overrides (#2542) 2024-02-16 14:42:43 -05:00
Jeffrey Morgan 48a273f80b
Fix issues with templating prompt in chat mode (#2460) 2024-02-12 15:06:57 -08:00