ollama

History

Jesse Gross 3fe74fba42 llm: Use first layer as memory buffer in estimation This is a partial revert of `0478d44` "Fixed over vram allcation dure to small initial layer sizes." Previously we used the size of the first layer as an extra reserved amount of space to buffer our memory estimates. The above commit changed this to use the largest layer. However, this had performance impacts on more models than the original commit was trying to fix. There is just a heuristic without an ideal solution so this goes back to the historic behavior. Fixes: #10765, #10756, #10752, #10726		2025-05-19 14:03:34 -07:00
..
llm_darwin.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_linux.go	Optimize container images for startup (#6547 )	2024-09-12 12:10:30 -07:00
llm_windows.go	win: lint fix (#10571 )	2025-05-05 11:08:12 -07:00
memory.go	llm: Use first layer as memory buffer in estimation	2025-05-19 14:03:34 -07:00
memory_test.go	Move quantization to new backend (#10363 )	2025-05-06 11:20:48 -07:00
server.go	ggml: Seperate tensor load from backend creation	2025-05-19 09:54:22 -07:00
server_test.go	lint: enable usetesting, disable tenv (#10594 )	2025-05-08 11:42:14 -07:00
status.go	Improve crash reporting (#7728 )	2024-11-19 16:26:57 -08:00