ollama/model/models
Grace fbd82ba5bb
Grace/deepseek v3 migration (#12385)
* init deepseek model file

* temp removal of flash attention implementation

* shapes and proper, can make a pass

* query, key, value have good cosine similarity, but the max diff is a bit high

* Attention block is working! ** with eager for now, have not added the mask line

* Attention block is working! ** with eager for now, have not added the mask line

* working MoE at around 0.95 cosine sim

* added cosine similarity function

* Starting end to end structure

* Trying (and failing) to get rope to work, going to test full thing on tater

* running on tater36... just not the right outputs

* we have the right values for rope... but its still not working?

* chnage Extrapolation Factor to 1

* removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer

* Temporary modelfiles for cpu

* change kpass intermediate step to kv, two layer outputs [0,1] look fine

* this calls for 16 chicken nuggets

* whoops

* cleaning up code

* delete stuff we dont need

* getting rid of debug statements for llama cpp

* working with long contexts

* fix long context view error

* reverting some changes I made for files that are not apart of pr

* Added proper tokenizer for deeepseek3

* clean up model and go test

* remove Modelfile

* not passing the tests

* whoops

* how to pass the ci tests

* resolving some of the comments

* rename

* linted and renamed deepseek3 -> deepseek2

* remove name go

* addressed changes - main change was adopting qwen3 naming scheme

* I cannot with linters

* clean up logs

* clean up logs

---------

Co-authored-by: Grace Guo <graceguo@Graces-MBP.localdomain>
Co-authored-by: Grace Guo <graceguo@Graces-MacBook-Pro.local>
Co-authored-by: graceguo <graceguo@tater36.localdomain>
2025-09-24 15:19:47 -07:00
..
bert embed: cleanup (#12299) 2025-09-16 09:48:42 -07:00
deepseek2 Grace/deepseek v3 migration (#12385) 2025-09-24 15:19:47 -07:00
gemma2 gemma: fix rope scaling for qat models (#12348) 2025-09-19 15:04:40 -07:00
gemma3 gemma: fix rope scaling for qat models (#12348) 2025-09-19 15:04:40 -07:00
gemma3n fix(llama): other llama flavours (#12308) 2025-09-17 12:12:21 -07:00
gptoss multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
llama multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
llama4 add pre:, suf: to tags (#12274) 2025-09-23 16:08:57 -07:00
mistral3 multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
mllama multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
qwen2 multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
qwen3 multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
qwen25vl multi-regexp pretokenizer (#12325) 2025-09-23 13:21:47 -07:00
models.go Grace/deepseek v3 migration (#12385) 2025-09-24 15:19:47 -07:00