RTX3090显卡配置tensorflow，使用CUDA 11.1在Ubuntu 20.10从源码编译

Tensorflow 2.x目前官方的版本暂时只支持到CUDA 10.1，但是RTX 3090显卡只支持CUDA 11及以上的版本，因此本次实验采用了从源码编译的方法来构建面向CUDA 11的tensorflow.

系统：Ubuntu 20.10
硬件环境：Intel Core i9 10900X, 128GB内存, 两块RTX 3090显卡
Python版本使用Anaconda来管理

(一) 准备环境

基础环境已安装：

Nvidia显卡驱动 (455.38)，使用Ubuntu自带附加驱动安装
Anaconda3 (或Miniconda3)，并配置了国内源
系统自带的gcc版本为10.2
vim和vscode编辑器

(1) 安装CUDA 11.1

首先打开下载地址：https://developer.nvidia.com/cuda-toolkit-archive

CUDA下载

选择最新版本11.1.1

然后选择系统linux，x86_64，Ubuntu，20.04，安装类型选runfile(local)，下面会显示安装说明，如：

wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run
sudo sh cuda_11.1.0_455.23.05_linux.run

选择CUDA 版本

这个安装包会比较大(3.5GB)，建议执行之前先cd到下载文件的目录。

注：如果想要更快的下载速度，可以使用mwget -n 32来加速，这里可以自己设置多线程下载的个数。

下载完执行上面的第二句命令来安装，会提示已安装驱动，选择continue即可，然后accept。注意在安装时，因为我们已经有显卡驱动，所以需要把Driver那个选项按空格键取消，然后选Install安装。

安装CUDA

安装完成后，需要配置环境变量，打开：

sudo vim ~/.bashrc

(如果没有安装vim，首先需要安装sudo apt install vim)

在文件末尾添加：

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_HOME}/lib64
export PATH=${CUDA_HOME}/bin:${PATH}

(括弧：这里的cuda路径不加版本号，以便后续升级cuda版本的时候不需要重新配置)

保存之后，更新环境变量：

source ~/.bashrc

最后输入nvcc -V验证CUDA是否已安装，并返回版本号等信息。

(2) 配置Cudnn 8.0.5

首先打开下载地址：https://developer.nvidia.com/rdp/cudnn-download

这里需要登陆或注册Nvidia账号，如果是第一次下载，可能还要填一份调查问卷，然后根据提示选择版本，这里选的是cuDNN Library for Linux (x86_64)

Cudnn下载

如果使用浏览器下载太慢，建议开始下载之后右键复制链接，然后取消下载，打开终端使用mwget来下载。

然后解压：

tar -zxvf cudnn-11.1-linux-x64-v8.0.5.39.tgz

接下来把cudnn的文件复制到CUDA目录：

sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/include/* /usr/local/cuda/include/
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

最后验证cudnn是否配置好并输出版本号：

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

返回值为相关define的信息。

(3) 配置TensorRT

打开下载地址：https://developer.nvidia.com/tensorrt

登陆Nvidia账号，同样的可能会让填一份问卷，提交之后点击立即下载。这里选择版本为"TensorRT 7.2.1 for Ubuntu 18.04 and CUDA 11.1 TAR package"

下载TensorRT

同样可以采用mwget加速下载，然后解压并配置：

tar -zxvf TensorRT-7.2.1.6.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz
sudo mv TensorRT-7.2.1.6 /usr/local/TensorRT-7.2.1.6
sudo ln -s /usr/local/TensorRT-7.2.1.6 /usr/local/tensorrt

然后配置环境变量，打开：

sudo vim ~/.bashrc

在文件末尾添加：

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/tensorrt/lib

然后更新环境变量：

source ~/.bashrc

(4) 配置Python接口

因为Ubuntu 20.10自带的python版本为3.8，而tensorrt只支持到3.7，因此需要建立一个新的Anaconda虚拟环境，设置python 3.7版本：

conda create -n tf2 python=3.7
conda activate tf2

然后在(tf2)下配置：

cd /usr/local/tensorrt/python
pip install tensorrt-7.2.1.6-cp37-none-linux_x86_64.whl

(5) 安装UFF和graphsurgeon

cd /usr/local/tensorrt/uff
pip install uff-0.6.9-py2.py3-none-any.whl

cd /usr/local/tensorrt/graphsurgeon
pip install graphsurgeon-0.4.5-py2.py3-none-any.whl

(6) 验证TensorRT安装

运行mnist示例程序：

cd /usr/local/tensorrt/samples/sampleMNIST
make
/usr/local/tensorrt/bin/sample_mnist

(7) 安装Bazel

首先打开下载地址：https://github.com/bazelbuild/bazel/releases

然后点击Assets，选择bazel-3.7.0-linux-x86_64，版本号以最新版为准，然后下载，可使用mwget加速。

这个下载的文件就是可执行文件，然后配置：

mkdir ~/bin
mv bazel-3.7.0-linux-x86_64 ~/bin/
cd ~/bin/
ln -s bazel-3.7.0-linux-x86_64 bazel

接下来配环境变量：

sudo vim ~/.bashrc

在文件末尾添加：

export BAZEL_HOME=/home/lab
export PATH=${PATH}:${BAZEL_HOME}/bin

(注意这里需要把lab改成自己的用户名)

然后更新环境变量：

source ~/.bashrc

最后输入bazel验证是否安装成功。

(二) 编译Tensorflow

(1) 获取源码

首先cd到想要下载源码的目录：

# 官网
git clone https://github.com/tensorflow/tensorflow.git
# 可以使用国内加速
git clone https://hub.fastgit.org/tensorflow/tensorflow.git
cd tesorflow

(2) 配置编译选项

首先保证Anaconda已经切换到刚刚建立的虚拟环境，然后：

./configure

选择python版本时，如果路径正确，则直接敲回车。

You have bazel 3.7.0 installed.
Please specify the location of python. [Default is /home/lab/miniconda3/envs/tf2/bin/python3]:

Found possible Python library paths:
/home/lab/miniconda3/envs/tf2/lib/python3.7/site-packages
Please input the desired Python library path to use. Default is [/home/lab/miniconda3/envs/tf2/lib/python3.7/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: y
TensorRT support will be enabled for TensorFlow.

后面的选项需要把CUDA和TensorRT设置成Y，然后选择CUDA版本，设置为11.1，CUDNN设置为8，TensorRT设置为7，因为NCCL没有配置，所以用默认：

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10]: 11.1
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 8
Please specify the TensorRT version you want to use. [Leave empty to default to TensorRT 6]: 7
Please specify the locally installed NCCL version you want to use. [Leave empty to use http://github.com/nvidia/nccl]:

然后配置CUDA路径：

Please specify the comma-separated list of base paths to look for CUDA libraries and headers. [Leave empty to use the default]:

这里输入：

/usr/local/tensorrt$ /lib,/lib/x86_64-linux-gnu,/usr,/usr/lib/x86_64-linux-gnu/libfakeroot,/usr/local/cuda,/usr/local/cuda/targets/x86_64-linux/lib,/usr/local/tensorrt

然后设置对GPU显卡算力的支持：

Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 8.6,8.6]:

对于RTX3090，输入8.6然后回车。（其他显卡查询：https://developer.nvidia.com/zh-cn/cuda-gpus#collapseOne）

是否使用clang选择N，gcc编译器使用默认，编译器优化选项默认回车，安卓选项设置为N：

Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.

最后输出以下信息，配置完成。

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=mkl_aarch64 # Build with oneDNN support for Aarch64.
--config=monolithic # Config for mostly static monolithic build.
--config=numa # Build with NUMA support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
--config=v2 # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished

(3) 配置github加速

首先已cd到tensorflow源码目录，并且Anaconda已激活虚拟环境。使用vscode编辑源码：

code WORKSPACE

按下Ctrl+H，把代码中的那个"https://github.com"替换为"https://hub.fastgit.org"，然后Ctrl+S保存，再编辑：

code tensorflow/workspace.bzl
code third_party/aws/workspace.bzl

按照同样的方法替换，大约分别有57处和4处，全部替换，并保存，关掉vscode。

注：后续如果还出现下载github很慢的情况，可能是tensorflow版本更新的改动，可以使用全文检索的方法来查找文件路径并替换网址(说明文档以及注释里的路径不需要替换)：

find .|xargs grep -ri "https://github.com/" -l

(4) 使用bazel进行build

构建带有CUDA支持的tensorflow：

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

根据网速的情况，下载过程大概需要20分钟左右，编译过程根据CPU配置可能需要1个多小时(酷睿i9 10900X)，把机子开着去睡觉即可。

(5) 编译报错问题解决

报错信息：

ModuleNotFoundError: No module named 'keras_preprocessing'

解决方法：

pip install keras_applications keras_preprocessing

成功安装后会提示"INFO: Build completed successfully"。

(6) 从master分支构建whl软件包

./bazel-bin/tensorflow/tools/pip_package/build_pip_package --nightly_flag /tmp/tensorflow_pkg

注意这里的输出路径/tmp/tensorflow_pkg建议不要修改，否则容易出问题。

(7) 安装生成的软件包

首先查看生成的whl文件

cd /tmp/tensorflow_pkg/
ls

查看生成的whl文件

这里会显示生成的文件名，然后安装这个软件包(以输出的文件名为准)。此时需要确认已经在正确的Anaconda虚拟环境下。

pip install tf_nightly-2.5.0-cp37-cp37m-linux_x86_64.whl

安装过程会自动安装相关依赖项。

(三) 测试编译的Tensorflow是否可以访问GPU

打开python，运行以下代码：

from tensorflow.python.client import device_lib
def get_available_gpus():
  local_device_protos = device_lib.list_local_devices()
  return [x.name for x in local_device_protos if x.device_type == 'GPU']
print(get_available_gpus())

这里会输出一些和GPU相关的信息：

2020-11-19 12:16:47.387251: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-19 12:16:48.022598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-19 12:16:48.022639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1273] 0 1
2020-11-19 12:16:48.022644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1286] 0: N N
2020-11-19 12:16:48.022647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1286] 1: N N
2020-11-19 12:16:48.026415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1413] Created TensorFlow device (/device:GPU:0 with 22423 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:17:00.0, compute capability: 8.6)
2020-11-19 12:16:48.028475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1413] Created TensorFlow device (/device:GPU:1 with 21814 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3090, pci bus id: 0000:65:00.0, compute capability: 8.6)
['/device:GPU:0', '/device:GPU:1']

如果输出结果和你的实际GPU符合，则说明Tensorflow可以访问GPU。

然后可以打开vscode，将Anaconda环境切换到本次配置的虚拟环境，然后测试自己的tensorflow程序。