1、执行nvidia-smi出现问题:
root@amax:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
2、常见的可能性是ubuntu内核版本和nvidia驱动版本不匹配,ubuntu重启时内核版本自动升级造成了(but我确实重启了,但是不知道内核有没有升级)
但是我的/usr/src目录下的nvidia驱动目录没有dkms.conf文件,无法通过网上常见的这种方式解决,sad==
sudo apt install dkms
sudo dkms install -m nvidia -v xxx.xx.xx
3、看了别的机器nvidia驱动目录下是有dkms.conf文件的,开始怀疑是之前通过下面方式安装高版本cuda的时候也同时升级了nvidia驱动,没装好或者导致了乱七八糟不匹配的问题;
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.6.0/local_installers/cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.0-510.39.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
4、清理cuda,重新安装,安装前查看本地/etc/apt/source.list.d中有没有匹配的离线源,正好有
apt-get purge nvidia*
apt-get autoclean
apt-get autoremove
apt-get update
apt-get upgrade
apt-get install cuda
5、安装好后到/usr/local目录下查看
安装成功,修改环境变量vim /etc/profile && source /etc/profile生效后可直接执行nvcc -V