* get eos_token_id from generation_config.json
* refactor
* include both ids and strings in trace
* comments
* remove special case for gemma3 special vocab (#10743)
This change bring in various interface cleanups along with greatly improving the performance of the sampler.
Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s