Commit Graph

67 Commits

Author SHA1 Message Date
Patrick Devine 44bc36d063
docs: update the faq (#11760) 2025-08-06 16:55:57 -07:00
Daniel Hiltgen 66fb8575ce
doc: add MacOS docs (#11334)
also removes stale model dir instructions for windows
2025-07-08 15:38:04 -07:00
Daniel Hiltgen 20c3266e94
Reduce default parallelism to 1 (#11330)
The current scheduler algorithm of picking the paralellism based on available
VRAM complicates the upcoming dynamic layer memory allocation algorithm.  This
changes the default to 1, with the intent going forward that parallelism is
explicit and will no longer be dynamically determined.  Removal of the dynamic
logic will come in a follow up.
2025-07-08 12:08:37 -07:00
Devon Rifkin 44b466eeb2 config: update default context length to 4096 2025-04-28 17:03:27 -07:00
Devon Rifkin dd93e1af85
Revert "increase default context length to 4096 (#10364)"
This reverts commit 424f648632.
2025-04-28 16:54:11 -07:00
Devon Rifkin 424f648632
increase default context length to 4096 (#10364)
* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`
2025-04-22 16:33:24 -07:00
Parth Sareen b816ff86c9
docs: make context length faq readable (#10006) 2025-03-26 17:34:18 -07:00
Bradley Erickson 74b44fdf8f
docs: Add OLLAMA_ORIGINS for browser extension support (#9643) 2025-03-13 16:35:20 -07:00
frob d8a5d96b98
docs: Add OLLAMA_CONTEXT_LENGTH to FAQ. (#9545) 2025-03-10 11:02:54 -07:00
Azis Alvriyanto b901a712c6
docs: improve syntax highlighting in code blocks (#8854) 2025-02-07 09:55:07 -08:00
Leisure Linux a400df48c0
docs: include port in faq.md OLLAMA_HOST examples (#8905) 2025-02-06 18:45:09 -08:00
Sam 1bdab9fdb1
llm: introduce k/v context quantization (vRAM improvements) (#6279) 2024-12-03 15:57:19 -08:00
Jeffrey Morgan 55ea963c9e
update default model to llama3.2 (#6959) 2024-09-25 11:11:22 -07:00
Patrick Devine 5804cf1723
documentation for stopping a model (#6766) 2024-09-18 16:26:42 -07:00
Jeffrey Morgan 83a9b5271a
docs: update examples to use llama3.1 (#6718) 2024-09-09 22:47:16 -07:00
SnoopyTlion 741affdfd6
docs: update faq.md for OLLAMA_MODELS env var permissions (#6587) 2024-09-02 15:31:29 -04:00
Michael Yang bb362caf88 update faq 2024-08-23 13:37:21 -07:00
Daniel Hiltgen 1a83581a8e
Merge pull request #5895 from dhiltgen/sched_faq
Better explain multi-gpu behavior
2024-07-29 14:25:41 -07:00
Jeffrey Morgan 0e4d653687
upate to `llama3.1` elsewhere in repo (#6032) 2024-07-28 19:56:02 -07:00
Daniel Hiltgen 830fdd2715 Better explain multi-gpu behavior 2024-07-23 15:16:38 -07:00
Daniel Hiltgen 1f50356e8e Bump ROCm on windows to 6.1.2
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen 69c04eecc4 Add windows radeon concurreny note 2024-07-02 12:46:14 -07:00
Daniel Hiltgen aae56abb7c Document concurrent behavior and settings 2024-06-28 13:15:57 -07:00
Patrick Devine 3bade04e10
doc updates for the faq/troubleshooting (#4565) 2024-05-21 15:30:09 -07:00
Patrick Devine f1548ef62d
update the FAQ to be more clear about windows env variables (#4415) 2024-05-13 18:01:13 -07:00
Jeffrey Chen d091fe3c21
Windows automatically recognizes username (#3214) 2024-05-06 15:03:14 -07:00
Daniel Hiltgen 20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Dr Nic Williams e8aaea030e
Update 'llama2' -> 'llama3' in most places (#4116)
* Update 'llama2' -> 'llama3' in most places

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-03 15:25:04 -04:00
Patrick Devine 74d2a9ef9a
add OLLAMA_KEEP_ALIVE env variable to FAQ (#3865) 2024-04-23 21:06:51 -07:00
Patrick Devine 1b272d5bcd
change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347) 2024-03-26 13:04:17 -07:00
Daniel Hiltgen d8fdbfd8da Add docs for GPU selection and nvidia uvm workaround 2024-03-21 11:52:54 +01:00
Bruce MacDonald a5ba0fcf78
doc: faq gpu compatibility (#3142) 2024-03-21 05:21:34 -04:00
Jeffrey Morgan 3a30bf56dc
Update faq.md 2024-03-20 17:48:39 +01:00
Jeffrey Morgan 7ed3e94105
Update faq.md 2024-03-18 10:24:39 +01:00
jmorganca 2297ad39da update `faq.md` 2024-03-18 10:17:59 +01:00
Daniel Hiltgen b53229a2ed Add docs explaining GPU selection env vars 2024-03-12 11:33:06 -07:00
Jeffrey Morgan f0425d3de9
Update faq.md 2024-02-20 20:44:45 -05:00
Jeffrey Morgan df56f1ee5e
Update faq.md 2024-02-19 22:16:42 -05:00
Jeffrey Morgan 41aca5c2d0
Update faq.md 2024-02-19 21:11:01 -05:00
Patrick Devine 9a7a4b9533
add faqs for memory pre-loading and the keep_alive setting (#2601) 2024-02-19 14:45:25 -08:00
Daniel Hiltgen b338c0635f Document setting server vars for windows 2024-02-19 13:30:46 -08:00
Tristan Rhodes 9774663013
Update faq.md with the location of models on Windows (#2545) 2024-02-16 11:04:19 -08:00
Michael Yang 93a756266c faq: update to use launchctl setenv 2024-01-22 13:10:13 -08:00
Matt Williams 511069a2a5 update where are models stored q
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-12-22 09:48:44 -08:00
Matt Williams 291700c92d
Clean up documentation (#1506)
* Clean up documentation

Will probably need to update with PRs for new release.

Signed-off-by: Matt Williams <m@technovangelist.com>

* Correcting to fit in 0.1.15 changes

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* addressing comments

Signed-off-by: Matt Williams <m@technovangelist.com>

* more api cleanup

Signed-off-by: Matt Williams <m@technovangelist.com>

* its llava not llama

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Updated hosting to server and documented all env vars

Signed-off-by: Matt Williams <m@technovangelist.com>

* remove last of the cli descriptions

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* update further per conversation with jeff earlier today

Signed-off-by: Matt Williams <m@technovangelist.com>

* cleanup the doc readme

Signed-off-by: Matt Williams <m@technovangelist.com>

* move upgrade to faq

Signed-off-by: Matt Williams <m@technovangelist.com>

* first change

Signed-off-by: Matt Williams <m@technovangelist.com>

* updated

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* examples in parent

Signed-off-by: Matt Williams <m@technovangelist.com>

* add exapmle for create model.

Signed-off-by: Matt Williams <m@technovangelist.com>

* update faq

Signed-off-by: Matt Williams <m@technovangelist.com>

* update create model api

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/api.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* update the readme in docs

Signed-off-by: Matt Williams <m@technovangelist.com>

* update a few more things

Signed-off-by: Matt Williams <m@technovangelist.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/faq.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update README.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/modelfile.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update docs/troubleshooting.md

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-12-22 09:10:01 -08:00
Patrick Devine a607d922f0
add FAQ for slow networking in WSL2 (#1646) 2023-12-20 16:27:24 -08:00
James Radtke 7eda3d0c55
Corrected transposed 129 to 192 for OLLAMA_ORIGINS example (#1325) 2023-11-29 22:44:17 -05:00
ToasterUwU 63097607b2
Correct MacOS Host port example (#1301) 2023-11-29 11:44:03 -05:00
ftorto e1a69d44c9
Update faq.md (#1299)
Fix a typo in the CA update command
2023-11-28 09:54:42 -05:00
Michael Yang c82ead4d01 faq: fix heading and add more details 2023-11-17 09:02:17 -08:00