Commit Graph

4813 Commits

Author SHA1 Message Date
Eva Ho 3f6d669a65 app/ui: use requestAnimationFrame to prevent bottom line cutoff in streaming thinking display 2025-11-18 15:51:58 -05:00
Michael Yang 440a3823a6
fix(tokenizer): add special tokens to empty inputs (#13091) 2025-11-18 11:16:56 -08:00
Michael Yang 718961de68
migrate to golangci-lint v2 (#13109)
* migrate to golangci-lint v2
* copyloopvar
2025-11-18 11:00:26 -08:00
SamareshSingh 330f62a7fa
docs: add Void Editor to community integrations (#13124)
Void is an open source AI code editor and Cursor alternative that supports
Ollama. It's built on VS Code and allows users to connect directly to Ollama
for private LLM usage without going through a middleman backend.

Key features:
- Open source Cursor alternative
- Direct Ollama integration
- VS Code fork with full compatibility
- Agent mode and MCP support
- Works with any open source model

Fixes #12919

Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>
2025-11-17 19:20:36 -08:00
Grace 584e2d646f
Add deepseek v3.1 (#13063)
* Add mla for flash attention
* Revert to using chunks
2025-11-17 18:03:21 -08:00
Eva H 1fd4cb87b2
app/cmd: restrict ollama:// URL scheme to supported paths (#13120) 2025-11-17 20:10:45 -05:00
Cerussite 4aba2e8b72
discover: Support cgroups cores and memory limitations (#10292)
* Add supports for cgroups cores and memory limitations

* fix compile error and add logs

* remove cpu info log
2025-11-17 16:13:03 -08:00
Daniel Hiltgen 2f36d769aa
bring back sysfs based VRAM information for AMD (#12871)
* build: optimize dockerfile context for iterating

This moves the copy of the source into the layer AFTER
doing software installs so we don't have to go through
the RPM install for cuda, etc. every time you touch a
source file.

* amd: implement linux sysfs based VRAM lookup

This adds a C++ implementation of sysfs DRM VRAM discovery
for more accurate free VRAM data on linux for AMD GPUs.
2025-11-17 15:40:58 -08:00
Daniel Hiltgen 399eacf486
ci: fix missing vulkan binaries in linux bundles (#13123) 2025-11-17 15:39:59 -08:00
Eva H 231cc878cb
app/ui: fix to point ollama client to ui backend in dev mode (#13079) 2025-11-17 12:58:35 -05:00
Jeffrey Morgan aa676b313f
docs: link to ollama.com instead of hardcoding list of cloud models (#13110) 2025-11-16 20:56:09 -08:00
omahs dd0ed0ef17
docs: fix typos in repository documentation (#10683) 2025-11-15 20:22:29 -08:00
Joel Bryan Juliano d5649821ae
readme: add Kdeps to community integrations (#11877)
Kdeps is an AI framework for building Dockerized full-stack AI
applications declaratively and uses Ollama LLM models on the
backend
2025-11-15 19:19:03 -08:00
pierwill 4cea757e70
server: clean up manifest documentation (#12995)
Co-authored-by: pierwill <pierwill@users.noreply.github.com>
2025-11-15 19:13:15 -08:00
Vignesh Skanda a751bc159c
llama: test case typo and readability improvements (#13078) 2025-11-15 18:54:27 -08:00
Laurențiu Nicola 5d31242fbf
discover: fix typos in runner.go (#13096) 2025-11-15 18:52:54 -08:00
Patrick Devine d7fd72193f
tests: basic benchmarking test framework (#12964)
This change adds a basic benchmarking test framework for Ollama which can
be used to determine the prefill, eval, load duration, and total duration
for running a given model or models.
2025-11-15 18:17:40 -08:00
Daniel Hiltgen 72ff5b9d8c
log: warn if user overrides detected (#13088)
Many failed GPU discovery issues recently can be traced to incorrect override settings.
This extra logging should help quickly spot these and guide users to try unsetting them first.
2025-11-14 14:36:28 -08:00
Parth Sareen ce29f695b4
docs: add logprobs to openapi (#13090) 2025-11-14 14:14:58 -08:00
Michael Yang 12b174b10e
fix tensor merge (#13053) 2025-11-13 15:32:34 -08:00
Michael Yang 333203d871
chore: update models to use slice/chunk/chunksections (#12934)
* use slice/chunks

* bert

* llama4

* gemma3n

* gptoss

* mistral3

* qwen3vl

* qwen25vl

* deepseek2

* remove unused ops
2025-11-13 15:20:12 -08:00
Parth Sareen c114987523
logprob: add bytes to logprobs (#13068) 2025-11-13 13:49:25 -08:00
Michael Yang b48083f33f
ml: add slice operation (#12870)
* slice

* chunk, chunksections
2025-11-13 13:28:21 -08:00
nicole pardal 482bec824f
embeddings: added cli command to embedding docs (#12993) 2025-11-13 13:24:13 -08:00
Kowyo 684a9a8c5a
docs: fix typo (VSCode -> VS Code) (#13072) 2025-11-12 20:49:33 -08:00
Jeffrey Morgan 54a76d3773
app: remove source code for previous JavaScript-based macOS app (#13067)
The code in this directory has been replaced with the
new Go version in the 'app' directory.
2025-11-12 20:37:43 -08:00
Radhi 8a75d8b015
readme: add AI UI to community integrations (#13035) 2025-11-12 17:08:50 -08:00
Jeffrey Morgan f206357412
readme: fix incorrect header in community integrations (#13065) 2025-11-12 17:00:16 -08:00
Daniel Hiltgen 8224cd9063
ci: fix win vulkan (#13062) 2025-11-12 10:32:24 -08:00
Daniel Hiltgen 6286d9a3a5
Enable Vulkan with a temporary opt-in setting (#12931)
* docs: vulkan information

* Revert "CI: Set up temporary opt-out Vulkan support (#12614)"

This reverts commit 8b6e5baee7.

* vulkan: temporary opt-in for Vulkan support

Revert this once we're ready to enable by default.

* win: add vulkan CI build
2025-11-12 08:40:38 -08:00
Daniel Hiltgen 3a9e8e9fd4
vulkan: temporary cary of vulkan fixes (#12971)
This should be reverted once we update ggml past b6897
2025-11-12 08:31:40 -08:00
Jeffrey Morgan cb1cb06478
docs: rename api-reference.md back to api.md since redirect stopped working (#13056) 2025-11-11 15:53:06 -08:00
Jeffrey Morgan 2d5e066c8c
docs: fix openapi.yaml warnings, rename api.md to api-reference.md (#12904) 2025-11-11 15:39:35 -08:00
Bruce MacDonald 15968714bd
docs/openapi: document that delete and copy responses are empty (#13055)
Some route endpoints return an empty response with a 200 OK. These should be documented in the OpenAPI doc. Note that the previous deletion response was not correct.
2025-11-11 15:07:21 -08:00
Jesse Gross 8bf38552de llm: Prefer dedicated GPUs over iGPUs when allocating memory
We currently assign model layers to GPUs according to free VRAM,
which assumes that GPU performance is roughly equal. This does not
work well for mixed dGPU and iGPU systems because iGPUs typically
use system memory which is large but their performance is slow.
This instead assigns layers to dGPUs first and then iGPUs.

In the future, this could be generalized to have a more fine grained
notion of GPU performance but dGPU vs. iGPU performance is the most
extreme.
2025-11-11 13:11:08 -08:00
Jesse Gross b13fbad0fe llm: Separate llamaServer and ollamaServer code paths
Originally, llamaServer represented old memory estimates, which
could be used with either the old or new engine. ollamaServer was
used only for the new estimates and new engine. Since these
implementations did not map directly to engine, there was engine-
specific code in common code paths.

Now that new estimates are always used for the new engine, there is
a direct mapping between server type and engine. This separates out
most of the engine-specific code into the correct implementation
to make things easier to understand.
2025-11-11 13:11:08 -08:00
Jesse Gross f560bd077f llm: Use Ollama engine memory layouts for both old and new engines
Currently for both the old and new engines, there is code to
calculate how much memory is required for a model and lay out
the layers onto GPUs. This reuses the new engine's lay out code
for the old engine as well, bringing them closer together. The
old engine continues to use its current method of estimating
required memory.

This reduces maintainence effort and improves consistency, as new
features only need to be implemented in one place. The newer code
is also more accurate, especially with multiple GPUs.
2025-11-11 13:11:08 -08:00
Jesse Gross 4372d0bfef llamarunner: Respect device ordering for offloaded layers
We used to control the way that llama.cpp saw devices using
CUDA_VISIBLE_DEVICES or similar. This would ensure that the layers
offloaded to a device were actually the ones intended. This is
particularly important because we might reorder devices based on
free memory or performance.

When we started explicitly scheduling layers, this logic went
away but the llamarunner didn't have any way to set the correct
order of devices. This meant that the correct number of layers
would be assigned to a device but not necessarily the layers
that were expected. This change sets up the devices correctly
based on the offload information.
2025-11-11 13:11:08 -08:00
Eva H 31361c4d3c
app/ui: do not send thinking to prevent errors with cloud provider 2025-11-11 16:09:24 -05:00
Baptiste Jamin 59241c5bee
server: add logprobs and top_logprobs support to Ollama's API (#12899)
Adds logprobs support to Ollama's API including support for Ollama's
OpenAI-compatible API. By specifying the new 'logprobs' boolean parameter
in the API, Ollama will return the log probabilities for each token generated.
'top_logprobs', an integer value can also be specified up to the value 20.
When specified, the API will also provide the number of most likely tokens to
return at each token position

Co-authored-by: Baptiste Jamin <baptiste@crisp.chat>
2025-11-11 08:49:50 -08:00
Eva Ho 2a9b61f099 address comment 2025-11-11 08:58:55 -05:00
Sheikh 6df4208836
docs: fix metal gpu section header (#13045) 2025-11-10 21:51:22 -08:00
Eva Ho 9d615cdaa0 fix test 2025-11-10 20:13:50 -05:00
Eva Ho 6a818b8a09 clean up 2025-11-10 19:08:42 -05:00
Eva Ho 2aaf29acb5 app/ui: do not send to prevent errors with cloud provider 2025-11-10 19:05:00 -05:00
Eva H a42f826acb
app/ui: using streamdown AI elements for markdown rendering 2025-11-10 12:05:59 -05:00
Bruce MacDonald e10a3533a5
app/docs: remove out of date storybook instructions (#13006) 2025-11-08 13:28:18 -08:00
Patrick Devine 91ec3ddbeb
bugfix: don't include both consolidated.safetensors and model-*.safetensors (#13010) 2025-11-07 22:41:57 -08:00
Parth Sareen 755ac3b069
docs: update n8n URL for Ollama (#12994) 2025-11-07 20:07:26 -08:00
Daniel Hiltgen 60b8973559
doc: re-add login autostart faq and GPU updates (#12975)
* doc: re-add login autostart faq

This appears to have been accidentally dropped during the doc migration.

* docs: GPU updates lost on the doc update

* review comments: improve windows login disable instructions
2025-11-07 11:21:44 -08:00