ollama

History

Jesse Gross 06b7ee7781 discover: Disable flash attention for Jetson Xavier (CC 7.2) GGML picks the wrong kernel and these systems fail with: Sep 28 22:25:39 xavier ollama[48999]: //ml/backend/ggml/ggml/src/ggml-cuda/fattn-wmma-f16.cu:437: ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 720. ggml-cuda.cu was compiled for: __CUDA_ARCH_LIST__ Fixes #12442		2025-10-07 14:09:01 -07:00
..
llm_darwin.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_linux.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_windows.go	win: lint fix (#10571 )	2025-05-05 11:08:12 -07:00
memory.go	discover: Disable flash attention for Jetson Xavier (CC 7.2)	2025-10-07 14:09:01 -07:00
memory_test.go	Use runners for GPU discovery (#12090 )	2025-10-01 15:12:32 -07:00
server.go	discovery: prevent dup OLLAMA_LIBRARY_PATH (#12514 )	2025-10-06 14:36:44 -07:00
server_test.go	Use runners for GPU discovery (#12090 )	2025-10-01 15:12:32 -07:00
status.go	Improve crash reporting (#7728 )	2024-11-19 16:26:57 -08:00