ollama

Commit Graph

Author	SHA1	Message	Date
Blake Mizerany	8294676150	server/internal/client/ollama: set User-Agent for registry client (#9775 ) This sets the agent header in DefaultRegistry to include the version of the client, OS, and architecture in the previous format, with a minor twist. Note: The version is obtained from the build info, instead of the version in version.Version, which should not longer be necessary, but we can remove in a future commit. Using the build info is more accurate and also provides extra build information if the build is not tagged, and if it is "dirty". Previously, the version was just "0.0.0" with no other helpful information. The ollama.com registry and others handle this swimmingly.	2025-03-14 18:33:07 -07:00
Patrick Devine	ef378ad673	gemma3 quantization (#9776 )	2025-03-14 17:41:07 -07:00
Daniel Hiltgen	2d2247e59e	Align versions for local builds (#9635 ) Darwin was using a different pattern for the version string than linux or windows.	2025-03-14 15:44:08 -07:00
Jesse Gross	7bf793a600	gemma3: Allow multiple image in a single input Previously processing multiple images in a batch would trigger segfaults so sending images together was disabled as a way to mitigate this. The trigger was processing one image on the CPU and one on the GPU. This can no longer happen: - The vision encoder is now on the GPU so both images would be processed on the GPU. - We require images to be fully contained in a batch and each image including its special tokens is over half the batch size. As a result, we will never get two images in the same batch. Fixes #9731	2025-03-14 15:38:54 -07:00
Jesse Gross	282bfaaa95	ollamarunner: Use a separate context per multimodal input Currently there is a single context per sequence, shared all by all multimodal inputs. Since we build a vision encoder graph per image, with a large number of inputs we can eventually hit the maximum number of graph nodes per context. This changes to use a separate context for each image, ensuring that available resource limits are consistent.	2025-03-14 15:38:54 -07:00
Jesse Gross	9679f40146	ml: Allow models to constrain inputs to a single batch Models may require that a set of inputs all be processed as part of the same batch. For example, if an image has multiple patches with fully connected attention between them, we should not split the batch in the middle of an image. Fixes #9697	2025-03-14 15:38:54 -07:00
Bruce MacDonald	3892c3a703	llm: remove internal subprocess req and resp types (#9324 ) This commit refactors the LLM subsystem by removing internal subprocess request and response types. It consolidates duplicate type definitions across the codebase, moving them to centralized locations. The change also standardizes interfaces between components, simplifies the ServerStatusResp struct, and moves the ParseDurationMs function to a common package. This cleanup reduces code duplication between different runner implementations (llamarunner and ollamarunner).	2025-03-14 15:21:53 -07:00
Blake Mizerany	4e320b8b90	server/internal/chunks: remove chunks package (#9755 )	2025-03-14 08:57:59 -07:00
Blake Mizerany	eb2b22b042	server/internal/client: use chunksums for concurrent blob verification (#9746 ) Replace large-chunk blob downloads with parallel small-chunk verification to solve timeout and performance issues. Registry users experienced progressively slowing download speeds as large-chunk transfers aged, often timing out completely. The previous approach downloaded blobs in a few large chunks but required a separate, single-threaded pass to read the entire blob back from disk for verification after download completion. This change uses the new chunksums API to fetch many smaller chunk+digest pairs, allowing concurrent downloads and immediate verification as each chunk arrives. Chunks are written directly to their final positions, eliminating the entire separate verification pass. The result is more reliable downloads that maintain speed throughout the transfer process and significantly faster overall completion, especially over unstable connections or with large blobs.	2025-03-13 22:18:29 -07:00
Michael Yang	4ea4d2b189	Merge pull request #9703 from ollama/mxyng/gemma3-memory count gemma3 vision tensors	2025-03-13 16:56:34 -07:00
Michael Yang	8d76fa23ef	count non-repeating vision layers	2025-03-13 16:53:29 -07:00
Bradley Erickson	74b44fdf8f	docs: Add OLLAMA_ORIGINS for browser extension support (#9643 )	2025-03-13 16:35:20 -07:00
Michael Yang	65b88c544f	fix divide by zero	2025-03-13 16:35:00 -07:00
Michael Yang	a422ba39c9	roughly count gemma3 graph the largest operation is by far (q @ k) so just count that for simplicity	2025-03-13 16:35:00 -07:00
Michael Yang	d2ec22371e	count all vision tensors	2025-03-13 16:35:00 -07:00
Michael Yang	033cec232a	count gemma3 vision tensors	2025-03-13 16:34:42 -07:00
Michael Yang	543240fb5f	Merge pull request #9741 from ollama/mxyng/visionless fix: error if image requested without vision model	2025-03-13 15:03:25 -07:00
Patrick Devine	4bed739259	add verbose mode to the show command (#9640 ) Add metadata and tensor information to the show command to be able to see more information about a model. This outputs the same data as shown on the model details page on ollama.com	2025-03-13 14:24:27 -07:00
Patrick Devine	80c7ce381b	fix: change default context size for gemma3 (#9744 )	2025-03-13 13:59:19 -07:00
Michael Yang	ccfd41c4f0	Merge pull request #9742 from ollama/mxyng/engine-error-embeddings fix: error on models that don't support embeddings	2025-03-13 13:12:33 -07:00
Michael Yang	3e102b7dad	Update model/model.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2025-03-13 13:11:52 -07:00
Michael Yang	ec46f3286c	engine: error on embeddings; not currently implemented	2025-03-13 11:40:55 -07:00
Michael Yang	5e2e0b46b1	fix: error if image requested without vision model	2025-03-13 10:52:09 -07:00
Michael Yang	45a13b1dec	Merge pull request #9688 from Shane-XB-Qian/debug_mistype_lld ollama-debug.c: correct mistype	2025-03-13 10:12:44 -07:00
Parth Sareen	5c0b663969	sample: separate softmax and temperature transforms (#9732 )	2025-03-13 09:53:27 -07:00
shane.xb.qian	30d7a59ba8	ollama-debug.c: change 'ld' to 'PRIi64' * macOS has different definition per info from @mxyng	2025-03-13 17:10:37 +08:00
ParthSareen	4aeb67ef4c	sample: do all sorting in topK	2025-03-12 11:59:17 -07:00
ParthSareen	3ba91634c1	sample: simplify top_k=0 sorting	2025-03-12 11:59:17 -07:00
ParthSareen	1b7433b71e	sample: use container/heap for top_k	2025-03-12 11:59:17 -07:00
Bruce MacDonald	a70820daa0	models/gemma3: remove final logit softcap (#9692 ) Softcap isn't in the whitepaper/implementation for the language model so we should remove it. There is no discernible difference in output with it removed.	2025-03-12 10:17:57 -07:00
Shane-XB-Qian	6b45b1d6b4	cli: adding support ctrl-n/p like general cli (#9136 ) Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>	2025-03-12 08:51:56 -07:00
shane.xb.qian	85ab552028	ollama-debug.c: correct mistype Signed-off-by: shane.xb.qian <shane.qian@foxmail.com>	2025-03-12 22:32:30 +08:00
frob	b3af953a55	cli: don't exit for invalid model during /load. (#9576 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-03-11 23:42:53 -07:00
Michael	ad4e0bf3be	Adding Gemma 3 to readme (#9671 )	2025-03-12 07:39:25 +01:00
Michael Yang	aee28501b5	Merge pull request #9661 from ollama/gemma engine: add gemma support	2025-03-11 15:07:50 -07:00
jmorganca	83f0ec8269	all: address linter errors	2025-03-11 14:49:20 -07:00
jmorganca	c6b6938b3a	kvcache: fix tests by adding AvgPool2D stub	2025-03-11 14:49:20 -07:00
jmorganca	fb4664fcec	model: add more spm tokenizer tests	2025-03-11 14:49:20 -07:00
jmorganca	20e3593863	model: validate left and right pairs before merging them	2025-03-11 14:49:20 -07:00
Michael Yang	63a394068c	use 2d pooling	2025-03-11 14:49:20 -07:00
Daniel Hiltgen	ab39e08eb9	llm: auto detect models that require Ollama Engine (#1 )	2025-03-11 14:49:20 -07:00
jmorganca	11bfa62796	add trailing \n\n after <end_of_image> to match reference implementation	2025-03-11 14:49:20 -07:00
jmorganca	f63e62e546	reduce kernel size, add TODO for loading from config	2025-03-11 14:49:20 -07:00
jmorganca	65b0f329d1	Revert "Allow models to force a new batch" This reverts commit c7eae586b899083acebcd9b3847b89ea78c2850c.	2025-03-11 14:49:20 -07:00
Jesse Gross	06007c0a18	Allow models to force a new batch This is useful for a few things: - Work around bugs, such as having 2 images in one batch - Keep the image in a single batch for fully connected attention - Improve performance by not evaluating embeddings multiple times	2025-03-11 14:49:20 -07:00
Jesse Gross	a8e83a7654	Disable causal attention based on batch index Currently we are using positions, which are relative to a sequence and may not be unique.	2025-03-11 14:49:20 -07:00
Jesse Gross	475005504e	Restrict Gemma to a single image per request	2025-03-11 14:49:20 -07:00
Jesse Gross	2c40c4d35e	Fix follow up images and images split across batches	2025-03-11 14:49:19 -07:00
Michael Yang	e95278932b	use non-causal mask only for image positions	2025-03-11 14:49:19 -07:00
Michael Yang	9d2a20a763	use non-causal mask for inputs with images	2025-03-11 14:49:19 -07:00

... 3 4 5 6 7 ...

4271 Commits All Branches Search

4271 Commits

All Branches