mirror of https://github.com/ollama/ollama.git
When we context shift, we delete half the context and apply RoPE with an offset to the other half. We used to RoPE across the entire context in a single pass with a zero offset for the deleted section. With the change to shifting in batches, we can skip any batches where all of the offsets would be zero. This typically reduces the number of operations by half. |
||
---|---|---|
.. | ||
cache.go | ||
causal.go | ||
causal_test.go | ||
encoder.go | ||
wrapper.go |