2025-02-06 03:16:28 +08:00
|
|
|
package models
|
|
|
|
|
|
|
|
import (
|
2025-09-16 06:35:59 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/bert"
|
Grace/deepseek v3 migration (#12385)
* init deepseek model file
* temp removal of flash attention implementation
* shapes and proper, can make a pass
* query, key, value have good cosine similarity, but the max diff is a bit high
* Attention block is working! ** with eager for now, have not added the mask line
* Attention block is working! ** with eager for now, have not added the mask line
* working MoE at around 0.95 cosine sim
* added cosine similarity function
* Starting end to end structure
* Trying (and failing) to get rope to work, going to test full thing on tater
* running on tater36... just not the right outputs
* we have the right values for rope... but its still not working?
* chnage Extrapolation Factor to 1
* removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer
* Temporary modelfiles for cpu
* change kpass intermediate step to kv, two layer outputs [0,1] look fine
* this calls for 16 chicken nuggets
* whoops
* cleaning up code
* delete stuff we dont need
* getting rid of debug statements for llama cpp
* working with long contexts
* fix long context view error
* reverting some changes I made for files that are not apart of pr
* Added proper tokenizer for deeepseek3
* clean up model and go test
* remove Modelfile
* not passing the tests
* whoops
* how to pass the ci tests
* resolving some of the comments
* rename
* linted and renamed deepseek3 -> deepseek2
* remove name go
* addressed changes - main change was adopting qwen3 naming scheme
* I cannot with linters
* clean up logs
* clean up logs
---------
Co-authored-by: Grace Guo <graceguo@Graces-MBP.localdomain>
Co-authored-by: Grace Guo <graceguo@Graces-MacBook-Pro.local>
Co-authored-by: graceguo <graceguo@tater36.localdomain>
2025-09-25 06:19:47 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/deepseek2"
|
2025-02-08 07:58:15 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/gemma2"
|
|
|
|
_ "github.com/ollama/ollama/model/models/gemma3"
|
2025-06-26 12:47:09 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/gemma3n"
|
2025-08-06 03:21:16 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/gptoss"
|
2025-02-06 03:16:28 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/llama"
|
2025-04-04 06:18:29 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/llama4"
|
2025-03-15 07:56:32 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/mistral3"
|
2025-02-06 03:16:28 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/mllama"
|
2025-05-22 01:21:24 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/qwen2"
|
2025-05-14 11:58:02 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/qwen25vl"
|
2025-05-22 01:21:07 +08:00
|
|
|
_ "github.com/ollama/ollama/model/models/qwen3"
|
2025-02-06 03:16:28 +08:00
|
|
|
)
|