原文:https://makeoptim.com/deep-learning/tensorflow-gpu-on-ubuntu
- 介绍
- 所需软件
- 安装前
- NVIDIA machine learning
- NVIDIA GPU driver
- CUDA ToolKit and cuDNN
- TensorRT
- Miniconda
- 虚拟环境
- 安装 TensorFlow
- 安装 JupyterLab 和 matplotlib
- 在 JupyterLab 中运行 TensorFlow
- 延伸阅读
- 参考链接
介绍
- Ubuntu 18.04.5 LTS
- GTX 1070
- TensorFlow 2.4.1
所需软件
- NVIDIA® GPU drivers —CUDA® 11.0 需要 450.x 或者更高版本。
- CUDA® Toolkit — TensorFlow 所需 CUDA® 11 (TensorFlow >= 2.4.0)
- cuDNN SDK 8.0.4 cuDNN versions。
- (Optional) TensorRT 6.0 改善延迟和吞吐量,以在某些模型上进行推理。
- Miniconda — 创建虚拟环境。
安装前
GCC
$ gcc --version
Command 'gcc' not found, but can be installed with:
sudo apt install gcc
$ sudo apt install gcc
$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
NVIDIA package repositories
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
$ sudo apt-get update
NVIDIA machine learning
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
$ sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
$ sudo apt-get update
NVIDIA GPU driver
$ sudo apt-get install --no-install-recommends nvidia-driver-460
注:这里需要使用 460 版本,TensorFlow 官网写的是 450,实测失败。
重启并使用以下命令检查 GPU 是否可见。
$ nvidia-smi
Mon Apr 5 16:17:17 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:01:00.0 On | N/A |
| 0% 48C P8 9W / 180W | 351MiB / 8111MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 997 G /usr/lib/xorg/Xorg 18MiB |
| 0 N/A N/A 1145 G /usr/bin/gnome-shell 53MiB |
| 0 N/A N/A 1353 G /usr/lib/xorg/Xorg 108MiB |
| 0 N/A N/A 1495 G /usr/bin/gnome-shell 83MiB |
| 0 N/A N/A 1862 G ...AAAAAAAAA= --shared-files 82MiB |
+-----------------------------------------------------------------------------+
CUDA ToolKit and cuDNN
$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
$ sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
$ sudo apt-get update
# Install development and runtime libraries (~4GB)
$ sudo apt-get install --no-install-recommends \
cuda-11-0 \
libcudnn8=8.0.4.30-1+cuda11.0 \
libcudnn8-dev=8.0.4.30-1+cuda11.0
TensorRT
$ sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
libnvinfer-dev=7.1.3-1+cuda11.0 \
libnvinfer-plugin7=7.1.3-1+cuda11.0
Miniconda
从 https://docs.conda.io/en/latest/miniconda.html 下载 Python 3.8 安装脚本。
增加可执行权限
$ chmod +x Miniconda3-latest-Linux-x86_64.sh
执行安装脚本
$ ./Miniconda3-latest-Linux-x86_64.sh
重启终端,激活 conda。
虚拟环境
创建一个名称为 tensorflow
的虚拟环境。
$ conda create -n tensorflow python=3.8.5
$ conda activate tensorflow
安装 TensorFlow
$ pip install tensorflow==2.4.1
验证安装
$ python -c "import tensorflow as tf;print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"
2021-04-05 16:20:00.426536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-05 16:20:01.170305: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-05 16:20:01.170830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-05 16:20:01.198917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-05 16:20:01.199497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.7845GHz coreCount: 15 deviceMemorySize: 7.92GiB deviceMemoryBandwidth: 238.66GiB/s
2021-04-05 16:20:01.199519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-04-05 16:20:01.201250: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-04-05 16:20:01.201278: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-04-05 16:20:01.201995: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-05 16:20:01.202159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-05 16:20:01.203993: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-05 16:20:01.204412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-04-05 16:20:01.204499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-04-05 16:20:01.204566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-05 16:20:01.204897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-05 16:20:01.205168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Num GPUs Available: 1
安装 JupyterLab 和 matplotlib
$ pip install jupyterlab matplotlib
在 JupyterLab 中运行 TensorFlow
$ jupyter lab
JupyterLab 将自动在浏览器打开。
从 https://www.tensorflow.org/tutorials/images/cnn 下载并导入 CNN notebook。
执行 Restart Kernel and Run All Cells
。
当训练开始, 检查 GPU 进程,可以看到 ...nvs/tensorflow/bin/python
表示正在使用 GPU 训练模型。
$ nvidia-smi
Mon Apr 5 16:36:28 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 On | 00000000:01:00.0 On | N/A |
| 23% 54C P2 72W / 180W | 7896MiB / 8111MiB | 55% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 997 G /usr/lib/xorg/Xorg 18MiB |
| 0 N/A N/A 1145 G /usr/bin/gnome-shell 73MiB |
| 0 N/A N/A 1353 G /usr/lib/xorg/Xorg 136MiB |
| 0 N/A N/A 1495 G /usr/bin/gnome-shell 53MiB |
| 0 N/A N/A 1862 G ...AAAAAAAAA= --shared-files 99MiB |
| 0 N/A N/A 3181 C ...nvs/tensorflow/bin/python 7507MiB |
+-----------------------------------------------------------------------------+
安装 VSCode
前往官网下载并安装 VSCode
。
打开 VSCode
并安装 Python
支持。
选择某个文件夹(这里以 ~/tensorflow-notebook/01-hello
为例),新建文件 hello.ipynb
。
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
hello.numpy()
使用 VSCode
打开刚才创建的 ~/tensorflow-notebook/01-hello/hello.ipynb
,并选择 Python
为创建的虚拟环境。
VSCode 运行 TensorFlow
小结
至此,开发环境已经搭建完毕。大家可以根据自己的习惯,选择使用命令行、JupyterLab
或者 VSCode 进行开发。
延伸阅读
- Mac 机器学习环境 (TensorFlow, JupyterLab, VSCode)
- Mac M1 机器学习环境 (TensorFlow, JupyterLab, VSCode)
- Win10 机器学习环境 (TensorFlow GPU, JupyterLab, VSCode)