ollama

Commit Graph

Author	SHA1	Message	Date
Grace	fbd82ba5bb	Grace/deepseek v3 migration (#12385 ) * init deepseek model file * temp removal of flash attention implementation * shapes and proper, can make a pass * query, key, value have good cosine similarity, but the max diff is a bit high * Attention block is working! ** with eager for now, have not added the mask line * Attention block is working! ** with eager for now, have not added the mask line * working MoE at around 0.95 cosine sim * added cosine similarity function * Starting end to end structure * Trying (and failing) to get rope to work, going to test full thing on tater * running on tater36... just not the right outputs * we have the right values for rope... but its still not working? * chnage Extrapolation Factor to 1 * removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer * Temporary modelfiles for cpu * change kpass intermediate step to kv, two layer outputs [0,1] look fine * this calls for 16 chicken nuggets * whoops * cleaning up code * delete stuff we dont need * getting rid of debug statements for llama cpp * working with long contexts * fix long context view error * reverting some changes I made for files that are not apart of pr * Added proper tokenizer for deeepseek3 * clean up model and go test * remove Modelfile * not passing the tests * whoops * how to pass the ci tests * resolving some of the comments * rename * linted and renamed deepseek3 -> deepseek2 * remove name go * addressed changes - main change was adopting qwen3 naming scheme * I cannot with linters * clean up logs * clean up logs --------- Co-authored-by: Grace Guo <graceguo@Graces-MBP.localdomain> Co-authored-by: Grace Guo <graceguo@Graces-MacBook-Pro.local> Co-authored-by: graceguo <graceguo@tater36.localdomain>	2025-09-24 15:19:47 -07:00
Michael Yang	3f6642f6fc	model: implement bert in ollama engine (#9080 ) * fix truncate * s/SentencePieceModel/SentencePiece/ * bert * wordpiece * refactor pooling * more tokenizers * normalize embeddings	2025-09-15 15:35:59 -07:00
Michael Yang	fa7776fd24	gpt-oss (#11672 ) * bf16 * tests * gpt-oss * enable gptoss for engine * rough estimate * convert to mxfp4 * handle safetensors U8 * clamp glu/linear * update tokenizer * MXFP4 support This implements the Open Compute Microscaling (MX) FP4 format as a tensor type with backend implementations focusing on mulmat and mulmatid on CPU, CUDA, and Metal. * Unit tests for MXFP4 support This exercises various operations and shapes on both CPU and GPU (if detected on the system) * cuda graph * unit test adjustments * cuda: optimize memory access Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4 * mac: fix crash on old macos versions cblas_sgemm is only supported on v13.3 and up, however bf16 is only supported on v14+ so we were falling back to ggml-blas and crashing on bf16 tensors. Checking for the function being null seems to be the simplest way to condittionally avoid registering the backend. * server: Minimum context length for gptoss This model requires a minimum context length of 8192 to function effectively. Users can set higher values through all normal mechanisms but lower values will be silently reset. * ggml: Multiply by numParallel for gptoss sliding window When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account. * gpt-oss integration includes harmony parser and thinking levels, etc. * fix sync * fix tests * fix lint --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com> Co-authored-by: Jesse Gross <jesse@ollama.com> Co-authored-by: Devon Rifkin <drifkin@drifkin.net>	2025-08-05 12:21:16 -07:00
Michael Yang	73b642e6f3	add new gemma model (#11204 ) * update patches * cherry pick metal mean kernel * cherry pick cuda mean kernel * gemma3n	2025-06-25 21:47:09 -07:00
Michael Yang	c890011322	feat: port qwen2 model (#10782 )	2025-05-21 10:21:24 -07:00
Michael Yang	e0ed984cde	feat: qwen3 dense and sparse models (#10708 ) * feat: qwen3 dense * feat: qwen3moe * fix llama4 moe	2025-05-21 10:21:07 -07:00
Bruce MacDonald	0aa8b371dd	model: add Qwen2.5-VL support (#10385 )	2025-05-13 20:58:02 -07:00
Michael Yang	f0c66e6dea	llama4	2025-04-25 16:59:20 -07:00
Bruce MacDonald	6bd0a983cd	model: support for mistral-small in the ollama runner Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.	2025-04-03 16:57:36 -07:00
Patrick Devine	5f74d1fd47	gemma2 impl	2025-03-11 14:35:08 -07:00
Jesse Gross	6945617af5	models: Move model into their own directory This allows there to be a file that is a list of models that is not mixed into the runner code.	2025-02-13 17:09:26 -08:00

11 Commits