History

Devon Rifkin 5f57b0ef42 add thinking support to the api and cli (#10584 ) - Both `/api/generate` and `/api/chat` now accept a `"think"` option that allows specifying whether thinking mode should be on or not - Templates get passed this new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Models' thinking support is inferred by inspecting model templates. The prefix and suffix the parser uses to identify thinking support is also automatically inferred from templates - Thinking control & parsing is opt-in via the API to prevent breaking existing API consumers. If the `"think"` option is not specified, the behavior is unchanged from previous versions of ollama - Add parsing for thinking blocks in both streaming/non-streaming mode in both `/generate` and `/chat` - Update the CLI to make use of these changes. Users can pass `--think` or `--think=false` to control thinking, or during an interactive session they can use the commands `/set think` or `/set nothink` - A `--hidethinking` option has also been added to the CLI. This makes it easy to use thinking in scripting scenarios like `ollama run qwen3 --think --hidethinking "my question here"` where you just want to see the answer but still want the benefits of thinking models		2025-05-28 19:38:52 -07:00
..
images	Fix import image width (#6528 )	2024-08-27 14:19:47 -07:00
README.md	docs: fix path to examples (#8438 )	2025-01-15 11:49:12 -08:00
api.md	add thinking support to the api and cli (#10584 )	2025-05-28 19:38:52 -07:00
benchmark.md	benchmark: performance of running ollama server (#8643 )	2025-03-21 13:08:20 -07:00
development.md	server/.../backoff,syncs: don't break builds without synctest (#9484 )	2025-03-03 16:45:40 -08:00
docker.md	docs: improve syntax highlighting in code blocks (#8854 )	2025-02-07 09:55:07 -08:00
examples.md	examples: remove codified examples (#8267 )	2025-01-13 11:26:22 -08:00
faq.md	config: update default context length to 4096	2025-04-28 17:03:27 -07:00
gpu.md	Revert "remove cuda v11 (#10569 )" (#10692 )	2025-05-13 13:12:54 -07:00
import.md	docs: remove unsupported quantizations (#10842 )	2025-05-24 13:17:26 -07:00
linux.md	Better WantedBy declaration	2025-03-07 10:26:31 +01:00
modelfile.md	api: remove unused sampling parameters (#10581 )	2025-05-08 08:31:08 -07:00
openai.md	docs: improve syntax highlighting in code blocks (#8854 )	2025-02-07 09:55:07 -08:00
template.md	docs: change more template blocks to have syntax highlighting	2025-04-15 12:08:11 -07:00
troubleshooting.md	Revert "remove cuda v11 (#10569 )" (#10692 )	2025-05-13 13:12:54 -07:00
windows.md	cleanup: remove OLLAMA_TMPDIR and references to temporary executables (#10182 )	2025-04-08 15:01:39 -07:00

README.md

Documentation

Getting Started

Reference

Resources