|
ext_server
|
feat: add support for flash_attn (#4120)
|
2024-05-20 13:36:03 -07:00 |
|
generate
|
Port cuda/rocm skip build vars to linux
|
2024-05-15 15:56:43 -07:00 |
|
patches
|
update llama.cpp submodule to `614d3b9` (#4414)
|
2024-05-16 13:53:09 -07:00 |
|
filetype.go
|
comments
|
2024-05-06 15:24:01 -07:00 |
|
ggla.go
|
refactor tensor query
|
2024-04-10 11:37:20 -07:00 |
|
ggml.go
|
add phi2 mem
|
2024-05-10 12:13:28 -07:00 |
|
gguf.go
|
cleanup
|
2024-05-20 16:13:57 -07:00 |
|
llm.go
|
comments
|
2024-05-06 15:24:01 -07:00 |
|
llm_linux.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |
|
memory.go
|
typo
|
2024-05-13 14:18:34 -07:00 |
|
server.go
|
feat: add support for flash_attn (#4120)
|
2024-05-20 13:36:03 -07:00 |
|
status.go
|
Switch back to subprocessing for llama.cpp
|
2024-04-01 16:48:18 -07:00 |