ollama/model/parsers
Devon Rifkin 05ba4ca1f4 parsers: fix unicode handling for qwen3-coder
When trimming whitespace at the end of every chunk, we were iterating
backwards over the string byte-by-byte instead of rune-by-rune.

As an example of how this can cause corruption, suppose we have the
multi-byte character  (`"\u2705"`), which is represented in utf-8 as
the three bytes `0xE2 0x9C 0x85`. It happens that `0x85` is NEL, which
passes `unicode.IsSpace()`. Because we were iterating byte-by-byte, this
caused us to mistakenly slice in the middle of the rune, removing `0x85`
and leaving `0xE2 0x9C`, which beyond being the incorrect place to
slice, is not even a valid utf-8 character.

`trailingWhitespaceLen()` was modified to count from the end in a
rune-aware way. Tests with various multibyte unicode characters were
also added.


Fixes: #12414
2025-09-25 15:47:46 -07:00
..
parsers.go harmony: remove special casing in routes.go 2025-09-18 14:55:59 -07:00
qwen3coder.go parsers: fix unicode handling for qwen3-coder 2025-09-25 15:47:46 -07:00
qwen3coder_test.go parsers: fix unicode handling for qwen3-coder 2025-09-25 15:47:46 -07:00