Commit Graph

7 Commits

Author SHA1 Message Date
Michael Yang 54055a6dae fix test 2025-04-25 16:59:01 -07:00
Parth Sareen a53d744b01
llama: remove model loading for grammar (#10096) 2025-04-24 11:51:19 -07:00
Parth Sareen 42a14f7f63
sample: add error handling for empty logits (#9740) 2025-03-20 11:11:18 -07:00
Jeffrey Morgan e093db92c4
sample: temporarily use grammars for constrained generation in new engine (#9586) 2025-03-10 16:17:39 +01:00
Parth Sareen 0682dae027
sample: improve ollama engine sampler performance (#9374)
This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
Parth Sareen c245b0406f
sample: remove transforms from greedy sampling (#9377) 2025-02-27 15:44:53 -08:00
Parth Sareen 0b7e1676eb
sample: add sampling package for new engine (#8410) 2025-02-24 17:19:01 -08:00