ollama

History

Jesse Gross fdb109469f llm: Allow overriding flash attention setting As we automatically enable flash attention for more models, there are likely some cases where we get it wrong. This allows setting OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually have flash attention.		2025-10-02 12:07:20 -07:00
..
llm_darwin.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_linux.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_windows.go	win: lint fix (#10571 )	2025-05-05 11:08:12 -07:00
memory.go	llm: Allow overriding flash attention setting	2025-10-02 12:07:20 -07:00
memory_test.go	Use runners for GPU discovery (#12090 )	2025-10-01 15:12:32 -07:00
server.go	llm: Allow overriding flash attention setting	2025-10-02 12:07:20 -07:00
server_test.go	Use runners for GPU discovery (#12090 )	2025-10-01 15:12:32 -07:00
status.go	Improve crash reporting (#7728 )	2024-11-19 16:26:57 -08:00