介绍
CentOS下安装nvidia+cuda+cudnn
NVIDIA驱动
去NVIDIA官网下载合适版本驱动,
-
安装lspci,使用下面命令,找寻lspci,发现在pciutils中,故安装pciutils
yum whatprovides */lspci yum install pciutils
检查是否安装了NVIDIA的GPU(硬件层面):
lspci | grep -i nvidia
-
安装kernel-devel和kernel-headers
sudo yum install kernel-devel sudo yum install kernel-headers
赋予运行权限
chmod a+x NVIDIA-Linux-x86_64-410.78.run
-
禁用nouveau
# 打开配置文件: vi /usr/lib/modprobe.d/dist-blacklist.conf # 加上或修改 两行 blacklist nouveau options nouveau modeset=0 查看nouveau是否禁用, 如果没有输出代表成功 lsmod | grep nouveau
-
可选
备份原来的 initramfs nouveau image镜像 mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img 创建新的 initramfs image镜像 dracut /boot/initramfs-$(uname -r).img $(uname -r)
-
安装
运行命令 sudo ./NVIDIA-Linux-x86_64-410.78.run 如果报错,则使用 sudo ./Nvidia*.sh --kernel-source-path=/usr/src/kernels/按TAB补全
CUDA
去这里选择合适版本下载
赋予运行权限
chmod a+x cuda_10.0.130_410.48_linux.run
-
安装
sudo ./cuda_10.0.130_410.48_linux.run
1. 会先有个阅读声明,一直按D,然后accept。 2. 很多选项 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48? (y)es/(n)o/(q)uit: n Install the CUDA 10.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-10.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 10.0 Samples? (y)es/(n)o/(q)uit: n 选项install the OpenGL libraries,如果双显卡(集显+独显)选择n,如果只有独显可以选择y,如果双显卡选择y的话,会出现黑屏或者循环登录的问题,如果加了上面的参数就不会出现这个选项了。 3. 安装过程结束后会有以下信息: =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-10.0 Samples: Not Selected Please make sure that - PATH includes /usr/local/cuda-10.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA. ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work. To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run -silent -driver Logfile is /tmp/cuda_install_11482.log
-
将cuda的bin文件和lib导出到系统环境中,版本不一样则更换其中cuda-x.x
export PATH="/usr/local/cuda-10.0/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH" 或者 vi ~/.bashrc export PATH="/usr/local/cuda-10.0/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH" source ~/.bashrc
-
测试:如果下面测试的最后结果都是Result = PASS,说明CUDA安装成功啦。
如果成功会输出版本信息
nvcc –V
-
编译并测试设备 deviceQuery:
cd /usr/local/cuda-9.2/samples/1_Utilities/deviceQuery sudo make ./deviceQuery
-
编译并测试带宽 bandwidthTest:
cd ../bandwidthTest sudo make ./bandwidthTest
-
其他
所需的libcudart.so.8.0如果正确安装的话,以下两种方法同理:sudo ldconfig /usr/local/cuda-8.0/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64
- 如果仍然不行,再尝试执行:
export PATH=\$PATH:/usr/local/cuda-8.0/bin export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-8.0/lib64 source /etc/profile
- 此时会显示
/sbin/ldconfig.real: /usr/local/cuda-8.0/lib64/libcudnn.so.6 不是符号连接
。不用担心,这时已经解决问题了。
安装cudnn
参考:https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
去这里下载相应版本
解压
tar -xzvf cudnn-10.0-linux-x64-v7.tgz
-
复制
cp include/cudnn.h /usr/local/cuda-10.0/include/ cp lib64/libcudnn* /usr/local/cuda-10.0/lib64/
授权
sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda-10.0/lib64/libcudnn*