ollama

Commit Graph

Author	SHA1	Message	Date
Devon Rifkin	2c2f4deaa9	openai: refactor to split compat layer and middleware This makes the core openai compat layer independent of the middleware that adapts it to our particular gin routes	2025-10-05 14:18:56 -07:00
Devon Rifkin	47991940d4	add qwen3-coder tool support The format qwen3-coder uses is relatively unique, both in rendering and in parsing. To implement parsing, I wrote a custom parser in similar style to harmony. For the rendering, I found that the logic would be much more difficult to follow in a template, so I introduced the concept of a built-in renderer that uses go code, rather than a template to generate prompts. I set us up for future built-in parsers and renderers by making it so they can be specified in a Modelfile like so: ``` RENDERER "qwen3-coder" PARSER "qwen3-coder" ``` These need to be provided explicitly because the architecture alone is not enough to understand what format the model expects to receive, and what format we expect it to output (e.g., qwen3-coder is `qwen3moe`, which includes other qwen3-family models as well) I haven't converted harmony to be one of these "built-ins" yet, since some of it is in flux with the changes @ParthSareen has been making to move harmony to the runner. It is likely that many other built-ins will need to move to the runner as well, but I'm able to slightly defer that decision since qwen3-coder doesn't have thinking (and therefore doesn't need to be in the runner to make structured outputs work). I expect to unify harmony with this approach very soon. Whether a particular model supports tools or thinking was previously inferred from templates, but without a template we now also use the parser itself to declare what it supports. If we have future models that re-use the same parsing format, but have different capabilities, we'll want to parameterize them and give them different names to be specified as a `PARSER`. Misc changes: - I worked on the renderer by diffing outputs from the reference implementation and ours. To make it easier to do this, I extended <https://github.com/ollama/ollama/pull/11875> to also support returning the prompt via the openai compat layer	2025-09-15 11:33:47 -07:00
Michael Yang	feb18cd710	feat: add dimensions field to embed requests (#12242 ) * feat: add field to truncate embeddings * add openai embeddings for dimensions	2025-09-11 10:36:10 -07:00
Michael Yang	91fc3c48e3	openai: remove reasoning as an api.Options (#11993 )	2025-08-20 12:21:42 -07:00
Michael Yang	d0cf6c8281	fix(openai): handle reasoning_effort (#11868 )	2025-08-12 11:02:01 -07:00
Devon Rifkin	735c41f9ca	openai: always provide reasoning We were missing passing along thinking if content was nil (as opposed to empty string) Also added a test for content not being passed, which was the real cause of <https://github.com/ollama/ollama/issues/11704>, since with the way `Content` is typed, not passing it and empty string are distinct	2025-08-06 18:54:20 -07:00
Devon Rifkin	759dd78dd6	openai: when converting role=tool messages, propagate the tool name Added support for converting both `name` and `tool_call_id` fields, which different clients might provide. `name` is a legacy field from the OpenAI completions API. For `tool_call_id` we inspect previous messages and look for a matching tool call ID and grab its name Issue: https://github.com/ollama/ollama/issues/11704	2025-08-06 17:00:24 -07:00
Devon Rifkin	203c137810	openai: allow for content _and_ tool calls in the same message Previously our OpenAI chat completions compat layer assumed that tool calls and content would never be provided together, but this is not a correct assumption. Content is only optional when tool calls are present, but tool calls and content can be provided together Fixes: https://github.com/ollama/ollama/issues/11704	2025-08-06 15:50:30 -07:00
Michael Yang	fa7776fd24	gpt-oss (#11672 ) * bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Devon Rifkin <drifkin@drifkin.net>	2025-08-05 12:21:16 -07:00
frob	5e67f4f90e	openai: allow openai endpoint to accept webp images (#11412 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-07-16 21:31:49 -07:00
Bruce MacDonald	9876c9faa4	chore(all): replace instances of interface with any (#10067 ) Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.	2025-04-02 09:44:27 -07:00
Anuraag (Rag) Agrawal	10d59d5f90	openai: finish_reason as tool_calls for streaming with tools (#7963 )	2025-02-13 10:20:12 -08:00
Anuraag (Rag) Agrawal	e28f2d4900	openai: return usage as final chunk for streams (#6784 ) * openai: return usage as final chunk for streams --------- Co-authored-by: ParthSareen <parth.sareen@ollama.com>	2024-12-12 17:09:30 -08:00
Blake Mizerany	9039c821a2	llama: preserve field order in user-defined JSON schemas (#8002 ) Previously we decoded and re-encoded JSON schemas during validation, which served no purpose since json.RawMessage already validates JSON syntax. Worse, the re-encoding lost field ordering from the original schema, which affects inference quality during step-by-step reasoning. While fixing this ordering issue by using json.RawMessage directly, testing revealed that schema_to_grammar (from llama.cpp) also fails to preserve field order during grammar generation. This appears to be the root cause of inference degradation. This change prevents us from mangling the user's original schema order, but we still need to address the ordering issue in schema_to_grammar. That will be a separate change. Updates #7978	2024-12-11 14:07:30 -08:00
Parth Sareen	630e7dc6ff	api: structured outputs - chat endpoint (#7900 ) Adds structured outputs to chat endpoint --------- Co-authored-by: Michael Yang <mxyng@pm.me> Co-authored-by: Hieu Nguyen <hieunguyen1053@outlook.com>	2024-12-04 16:31:19 -08:00
Parth Sareen	5f8051180e	Enable index tracking for tools - openai api support (#7888 )	2024-11-29 20:00:09 -08:00
Parth Sareen	ce7455a8e1	api: enable tool streaming (#7836 )	2024-11-27 13:40:57 -08:00
Bruce MacDonald	940e62772e	openai: remove unused error code (#7850 ) The writeError takes a code argument which is no longer used. Remove it for clarity.	2024-11-26 16:08:09 -08:00
frob	06d4fba851	openai: align chat temperature and frequency_penalty options with completion (#6688 )	2024-09-07 09:08:08 -07:00
Yaroslav	da915345d1	openai: don't scale temperature or frequency_penalty (#6514 )	2024-09-06 17:45:45 -07:00
frob	fe91d7fff1	openai: fix "presence_penalty" typo and add test (#6665 )	2024-09-06 01:16:28 -07:00
Michael Yang	b732beba6a	lint	2024-08-01 17:06:06 -07:00
royjhan	6f133a0bdd	OpenAI: Add Usage to `v1/embeddings` (#5886 ) * add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * add tokens to v1/embeddings * separate usage	2024-08-01 15:49:37 -07:00
royjhan	365431d406	return tool calls finish reason for openai (#5995 ) * hot fix * backend stream support * clean up * finish reason * move to openai	2024-07-29 13:56:57 -07:00
royjhan	c57317cbf0	OpenAI: Function Based Testing (#5752 ) * distinguish error forwarding * more coverage * rm comment	2024-07-19 11:37:12 -07:00
royjhan	51b2fd299c	adjust openai chat msg processing (#5729 )	2024-07-19 11:19:20 -07:00
royjhan	154f6f45d4	OpenAI: Support Tools (#5614 ) * reopen pr * tools * remove tc from stream for now * ID and Function * openai expects arguments to be a string (#5739) * mutually exclusive content and tool calls * clean up --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-07-16 20:52:59 -07:00
royjhan	0d41623b52	OpenAI: Add Suffix to `v1/completions` (#5611 ) * add suffix * remove todo * remove TODO * add to test * rm outdated prompt tokens info md * fix test * fix test	2024-07-16 20:50:14 -07:00
royjhan	987dbab0b0	OpenAI: /v1/embeddings compatibility (#5285 ) * OpenAI v1 models * Empty List Testing * Add back envconfig * v1/models docs * Remove Docs * OpenAI batch embed compatibility * merge conflicts * integrate with api/embed * ep * merge conflicts * request tests * rm resp test * merge conflict * merge conflict * test fixes * test fn renaming * input validation for empty string --------- Co-authored-by: jmorganca <jmorganca@gmail.com>	2024-07-16 13:36:08 -07:00
royjhan	e9f7f36029	Support image input for OpenAI chat compatibility (#5208 ) * OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Testing Cleanup * Test with Fatal * Add functionality to chat test * Support image input for OpenAI chat * Decoding * Fix message processing logic * openai vision test * type errors * clean up * redundant check * merge conflicts * merge conflicts * merge conflicts * flattening and smaller image * add test * support python and js SDKs and mandate prefixing * clean up --------- Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-07-13 22:07:45 -07:00
royjhan	4918fae535	OpenAI v1/completions: allow stop token list (#5551 ) * stop token parsing fix * add stop test	2024-07-09 14:01:26 -07:00
royjhan	d626b99b54	OpenAI: v1/completions compatibility (#5209 ) * OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Completions Endpoint * Testing Cleanup * Test with Fatal * Add functionality to chat test * Rename function * float types * type cleanup * cleaning * more cleaning * Extra test cases * merge conflicts * merge conflicts * merge conflicts * merge conflicts * cleaning * cleaning --------- Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-07-02 16:01:45 -07:00
royjhan	996bb1b85e	OpenAI: /v1/models and /v1/models/{model} compatibility (#5007 ) * OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Testing Cleanup * Test with Fatal * Add functionality to chat test * OpenAI: /v1/models/{model} compatibility (#5028) * Retrieve Model * OpenAI Delete Model * Retrieve Middleware * Remove Delete from Branch * Update Test * Middleware Test File * Function name * Cleanup * Test Update * Test Update --------- Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-07-02 11:50:56 -07:00
Jeffrey Morgan	6b800aa7b7	openai: do not set temperature to 0 when setting seed (#5045 )	2024-06-14 13:43:56 -07:00
Michael Yang	e40145a39d	lint	2024-06-04 11:13:30 -07:00
Jeffrey Morgan	41ba3017fd	Fix OpenAI `finish_reason` values when empty (#4368 )	2024-05-11 15:31:41 -07:00
Bruce MacDonald	cfa84b8470	add done_reason to the api (#4235 )	2024-05-09 13:30:14 -07:00
Patrick Devine	1b272d5bcd	change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347 )	2024-03-26 13:04:17 -07:00
Jeffrey Morgan	453f572f83	Initial OpenAI `/v1/chat/completions` API compatibility (#2376 )	2024-02-07 17:24:29 -05:00

39 Commits