3.8 KiB
Import a model
This guide walks through importing a GGUF, PyTorch or Safetensors model.
Importing (GGUF)
Step 1: Write a Modelfile
Start by creating a Modelfile. This file is the blueprint for your model, specifying weights, parameters, prompt templates and more.
FROM ./mistral-7b-v0.1.Q4_0.gguf
(Optional) many chat models require a prompt template in order to answer correctly. A default prompt template can be specified with the TEMPLATE instruction in the Modelfile:
FROM ./mistral-7b-v0.1.Q4_0.gguf
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
Step 2: Create the Ollama model
Finally, create a model from your Modelfile:
ollama create example -f Modelfile
Step 3: Run your model
Next, test the model with ollama run:
ollama run example "What is your favourite condiment?"
Importing (PyTorch & Safetensors)
Importing from PyTorch and Safetensors is a longer process than importing from GGUF. Improvements that make it easier are a work in progress.
Setup
First, clone the ollama/ollama repo:
git clone git@github.com:ollama/ollama.git ollama
cd ollama
and then fetch its llama.cpp submodule:
git submodule init
git submodule update llm/llama.cpp
Next, install the Python dependencies:
python3 -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt
Then build the quantize tool:
make -C llm/llama.cpp quantize
Clone the HuggingFace repository (optional)
If the model is currently hosted in a HuggingFace repository, first clone that repository to download the raw model.
Install Git LFS, verify it's installed, and then clone the model's repository:
git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model
Convert the model
Note: some model architectures require using specific convert scripts. For example, Qwen models require running
convert-hf-to-gguf.pyinstead ofconvert.py
python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin
Quantize the model
llm/llama.cpp/quantize converted.bin quantized.bin q4_0
Step 3: Write a Modelfile
Next, create a Modelfile for your model:
FROM quantized.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"
Step 4: Create the Ollama model
Finally, create a model from your Modelfile:
ollama create example -f Modelfile
Step 5: Run your model
Next, test the model with ollama run:
ollama run example "What is your favourite condiment?"
Publishing your model (optional – early alpha)
Publishing models is in early alpha. If you'd like to publish your model to share with others, follow these steps:
- Create an account
- Run
cat ~/.ollama/id_ed25519.pub(ortype %USERPROFILE%\.ollama\id_ed25519.pubon Windows) to view your Ollama public key. Copy this to the clipboard. - Add your public key to your Ollama account
Next, copy your model to your username's namespace:
ollama cp example <your username>/example
Then push the model:
ollama push <your username>/example
After publishing, your model will be available at https://ollama.com/<your username>/example.
Quantization reference
The quantization options are as follow (from highest highest to lowest levels of quantization). Note: some architectures such as Falcon do not support K quants.
q2_Kq3_Kq3_K_Sq3_K_Mq3_K_Lq4_0(recommended)q4_1q4_Kq4_K_Sq4_K_Mq5_0q5_1q5_Kq5_K_Sq5_K_Mq6_Kq8_0f16