1 解决NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver
Ubuntu20.04的系统,因机房断网,导致服务器重启,无法使用nvidia-smi查看显卡。报错如下:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
经过查找资料和实践验证,最后确定是:
是内核版本更新的问题,导致新版本内核和原来显卡驱动不匹配
解决:
查看已安装内核
dpkg --get-selections |grep linux-image
查看当前内核版本
uname -r
查看对应的驱动模块
cd /usr/src
ls
安装DKMS(Dynamic Kernel ModuleSupport)
sudo apt-get install dkms
# 根据个人实际的版本号填写
sudo dkms install -m nvidia -v 515.65.01
综上,问题暂时解决。