将
QWen
模型打包成ollama
可运行格式
一、模型转换
-
1.下载ollama和llm
git clone https://github.com/ollama/ollama.git cd ollama git submodule init git submodule update llm/llama.cpp
-
2.安装依赖
python3 -m venv llm/llama.cpp/.venv source llm/llama.cpp/.venv/bin/activate pip install -r llm/llama.cpp/requirements.txt
-
3.构建量化工具
make -C llm/llama.cpp quantize
-
4.下载模型
export HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download --token hf_xxx \ --resume-download \ --local-dir-use-symlinks False Qwen/Qwen1.5-7B-Chat \ --local-dir m3e-large
-
- 模型转化
此处用convert-hf-to-gguf.py 提换 convert.py
python llm/llama.cpp/convert-hf-to-gguf.py ./Qwen/Qwen1.5-7B-Chat --outtype f16 --outfile converted.bin
- 模型转化
-
- 模型量化
llm/llama.cpp/quantize converted.bin quantized.bin q4_0
二、ollama 构建包
-
准备Modelfile
FROM quantized.bin TEMPLATE "[INST] {{ .Prompt }} [/INST]"
-
构建部署包
davisgao@mac ~/ ollama create davisgao/qwen1.5 -f ModelFile transferring model data creating model layer creating template layer using already created layer sha256:0d655da2f0b08a1210068e234792da4dfcb5cd2896dfd57a813f52ccc9d0ab95 using already created layer sha256:68693db5eb3e0501c644080a545730fc93d2ca2dfddf03633642b99f3a1f0e3c using already created layer sha256:92a6f4b0a39deb0199816f5b3f25ad0db39e3d04b262a204ac76595e0e979654 writing manifest success
-
推送部署包
需要注册ollama账号
ollama push davisgao/qwen1.5