登录注册写文章

llama.cpp qwen2

香菜香菜我是折耳根

llama.cpp qwen2

开发环境

Apple M2
https://zhuanlan.zhihu.com/p/690548599
通义千问

1、准备模型
brew install git-lfs
git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git

2、准备llama.cpp
brew install ccache
git clone git@github.com:ggerganov/llama.cpp.git
cd llama.cpp
make

conda create -n llama-cpp python=3.10
conda activate llama-cpp
pip install -r requirements.txt

pip install tiktoken

3、模型转换
将下载的Qwen模型转换为GGUF文件格式。

这里可以写篇文章介绍GGUF、Qwen模型表示

python convert-hf-to-gguf.py ~/workspaces/ai/Qwen1.5-7B-Chat/

4、量化模型
./quantize ~/workspaces/ai/Qwen-7B-Chat/ggml-model-f16.gguf ./models/qwen-chat-ggml-model-Q4_K_M.gguf Q4_K_M

5、测试
./main -m models/qwen-chat-ggml-model-Q4_K_M.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

Ascend NPU

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

LLaMA-Factory 微调开源大模型
接触大模型有一段时间了，最近学习了一下使用LLaMA-Factory来对开源大模型进行微调，LLaMA-Facto...
雷涛赛文阅读 3,546评论 1赞 2
LLM实战：LLM微调加速神器-Unsloth+ Qwen1.5
1. 背景上一篇介绍了基于训练加速框架Unsloth，微调训练Llama3的显卡资源占用及训练时间对比。近期U...
mengrennwpu阅读 452评论 0赞 0
通义千问Qwen-72B-Chat基于PAI的低代码微调部署实践
作者：熊兮、求伯、一耘引言通义千问-72B（Qwen-72B）是阿里云研发的通义千问大模型系列的720亿参数规...
阿里云大数据AI平台阅读 930评论 0赞 0
LLM实战：LLM微调加速神器-Unsloth + LLama3
1. 背景五一结束后，本qiang~又投入了LLM的技术海洋中，本期将给大家带来LLM微调神器：Unsloth。...
mengrennwpu阅读 742评论 0赞 1
llm_finetune网页一键式大模型训练到服务的全流程平台
[LLM Finetune 网页格式一键式大模型训练到服务的全流程平台，包括数据上传、微调训练、模型合并、模型部署...
水他阅读 266评论 0赞 1

赞1赞

赞赏

手机看全文