2.Docker运行FastChat

FastChat流程图:

image.png

#保存模型数据
mkdir models
#模型下载时的缓存目录，防止断线需要重下
mkdir hf-cache

创建Dockerfile

FROM pytorch2
#fschat不限版本给你全装，限死版本告诉你缺fsapi，只能限定范围了
RUN pip3 install "fschat[model_worker,webui]>=0.2.28"

CMD /bin/bash

生成镜像

docker build -t fastchat -f Dockerfile-fastchat .

先起个临时容器用于下载模型数据和测试

docker run -it --rm \
-e HF_ENDPOINT=https://hf-mirror.com \
-e HF_HUB_ENABLE_HF_TRANSFER=0 \
-v /home/ubuntu/models:/model \
-v /home/ubuntu/hf-cache:/cache \
--gpus all \
fastchat \
/bin/bash

容器内下载模型数据

#将HF_HUB_ENABLE_HF_TRANSFER设为1可加快下载速度，但如果网络不稳定或需要看下载进度可改为0
huggingface-cli download --resume-download \
lmsys/vicuna-7b-v1.5-16k \
--local-dir=/model/lmsys/vicuna-7b-v1.5-16k \
--local-dir-use-symlinks=False \
--cache-dir=/cache

启动FastChat命令行模式测试下能不能用

#gpu模式 Vicuna-13B大概需要28GB显存，Vicuna-7B大概需要14GB显存
python3 -m fastchat.serve.cli --model-path /model/lmsys/vicuna-7b-v1.5-16k

#cpu模式 Vicuna-13B大概需要60GB内存，Vicuna-7B大概需要30GB内存
python3 -m fastchat.serve.cli --model-path/model/lmsys/vicuna-7b-v1.5-16k --device cpu

退出测试容器后就能开始正式架设FastChat了

架设FastChat 服务

启动controller 默认端口21001

docker run -d \
--restart unless-stopped \
-e TZ=Asia/Shanghai \
--network fastchat \
--name fastchat-center \
fastchat \
python3 -m fastchat.serve.controller --host=0.0.0.0

启动worker默认端口21002

docker run -d \
--restart unless-stopped \
-e TZ=Asia/Shanghai \
--network fastchat \
--name fastchat-worker1 \
-v /home/ubuntu/models:/model \
--gpus all \
fastchat \
python3 -m fastchat.serve.model_worker \
--model-path /model/lmsys/vicuna-7b-v1.5-16k \
--host=0.0.0.0 \
--controller-address=http://fastchat-center:21001 \
--worker-address=http://fastchat-worker1:21002

启动WEB服务默认端口7860

docker run -d \
--restart unless-stopped \
-e TZ=Asia/Shanghai \
--network fastchat \
--name fastchat-web \
-p 80:7860 \
fastchat \
python3 -m fastchat.serve.gradio_web_server --controller-url=http://fastchat-center:21001

打开网页试下效果

image.png

如果你想自己写程序调用API还可以启动API服务默认端口8000

docker run -d \
--restart unless-stopped \
-e TZ=Asia/Shanghai \
--network fastchat \
--name fastchat-api \
-p 8000:8000 \
fastchat \
python3 -m fastchat.serve.openai_api_server  --host=0.0.0.0 --controller-address=http://fastchat-center:21001 --api-keys=66666

调API看一下模型参数

curl http://localhost:8000/v1/models

收工，全部搞完!~

参考:
https://cloud.tencent.com/developer/article/2297923
https://blog.csdn.net/alionsss/article/details/130027299
https://blog.csdn.net/jclian91/article/details/131650918
https://zhuanlan.zhihu.com/p/620801429
https://hf-mirror.com/

2.Docker运行FastChat

推荐阅读更多精彩内容