ollama

History

Devon Rifkin 05ba4ca1f4 parsers: fix unicode handling for qwen3-coder When trimming whitespace at the end of every chunk, we were iterating backwards over the string byte-by-byte instead of rune-by-rune. As an example of how this can cause corruption, suppose we have the multi-byte character ✅ (`"\u2705"`), which is represented in utf-8 as the three bytes `0xE2 0x9C 0x85`. It happens that `0x85` is NEL, which passes `unicode.IsSpace()`. Because we were iterating byte-by-byte, this caused us to mistakenly slice in the middle of the rune, removing `0x85` and leaving `0xE2 0x9C`, which beyond being the incorrect place to slice, is not even a valid utf-8 character. `trailingWhitespaceLen()` was modified to count from the end in a rune-aware way. Tests with various multibyte unicode characters were also added. Fixes: #12414		2025-09-25 15:47:46 -07:00
..
imageproc	imageproc mllama refactor (#7537 )	2024-12-14 19:50:15 -08:00
input	batch: use tensors for outputs (#12185 )	2025-09-15 14:33:06 -07:00
models	Grace/deepseek v3 migration (#12385 )	2025-09-24 15:19:47 -07:00
parsers	parsers: fix unicode handling for qwen3-coder	2025-09-25 15:47:46 -07:00
renderers	address comments	2025-09-15 11:46:25 -07:00
testdata	gemma2 impl	2025-03-11 14:35:08 -07:00
bytepairencoding.go	multi-regexp pretokenizer (#12325 )	2025-09-23 13:21:47 -07:00
bytepairencoding_test.go	multi-regexp pretokenizer (#12325 )	2025-09-23 13:21:47 -07:00
model.go	fix: leaf alt name (#12390 )	2025-09-23 17:50:53 -07:00
model_test.go	fix: leaf alt name (#12390 )	2025-09-23 17:50:53 -07:00
sentencepiece.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
sentencepiece_test.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
textprocessor.go	model: handle multiple eos tokens (#10577 )	2025-05-16 13:40:23 -07:00
vocabulary.go	embedding gemma model (#12181 )	2025-09-04 09:09:07 -07:00
vocabulary_test.go	model: treat 'user defined' tokens as special tokens (#11077 )	2025-06-16 16:03:16 -07:00
wordpiece.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00
wordpiece_test.go	model: implement bert in ollama engine (#9080 )	2025-09-15 15:35:59 -07:00