大模型开发环境配置

大模型微调开发涉及以下软硬件

硬件

nvidia RTX-4090 24G显存
cpu 16核  
内存32G
磁盘ssd 系统盘200G

软件

操作系统

ubuntu 22.04 LTS 是目前大模型开发的首选操作系统，兼容的最全面
参考：https://www.doubao.com/thread/wb42629d1a7568826

cd $HOME/software
# wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py310_25.5.1-0-Linux-x86_64.sh -O Miniconda3.sh
sh Miniconda3.sh -b -p $HOME/software/miniconda3
$HOME/software/miniconda3/bin/conda init
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple
pip install -U pip

miniconda3

使用miniconda3创建沙箱环境，服务不同的项目对环境的需求，但大模型微调，有不同项目，共用同一套开发环境的趋势

nvidia

nvidia-driver:lastest  # 保持最新即可，向下兼容
cudatoolkit 12.4  # 目前兼容适配的最完善
cudnn 需与 cudatoolkit匹配

python

3.10 # 目前兼容适配的最完善

ms-swift

大模型微调全流程框架

pip install 'ms-swift'  # 保持最新

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ms-swift 3.6.1 requires datasets<3.4,>=3.0, but you have datasets 3.6.0 which is incompatible.
ms-swift 3.6.1 requires numpy<2.0, but you have numpy 2.2.6 which is incompatible.

vllm

部署首选框架

pip install vllm

docker

拉取镜像，减少环境配置，降低安装包
参考：https://www.doubao.com/thread/w35454dca77b1b0be
ubuntu下使用apt-get install即可

homebrew

e531fe4fee25bae880a6caf413e5275c.png

安装llama.cpp用得到

brew install llama.cpp

参考：https://www.doubao.com/thread/w93205fead47ae7b1

llama.cpp

速度确实快，量化神器。

源码安装，本地构建用cmake
安装预构建好的用homebrew, brew install llama.cpp 【推荐】
安装docker, 用docker拉llama.cpp镜像

unsloth

训练速度快，占用显存少，有大量notebook教程

pip install unsloth

ollama

社区活跃，使用上手方便，部署模型简单，按需拉起模型，全能多面手。

curl -fsSL https://ollama.com/install.sh | sh