我现在配的虚拟主机缺了颗GPU,一些使用GPU的算法无法在线演示,有点美中不足。网上搜了一圈,腾讯云现在有个推广活动,花点小钱就可以配一个实验了,比较便宜,其它一些厂的个人担负不起,所以在腾讯云上买了一个实例,试用一个月,以完成配置测试的实验。
完全没有用过Ubuntu,大概需要重装很多次才能搞定。进度很难估计,先用一个月看看。所以要写下每一步的详细文档,以便随时重装。我将安装tensorflow2.6、pytorch1.11.0与HanLP2.1,它们的版本不会冲突。对应的是CUDA11.2与cuDNN8.5,cuDNN8.5适配CUDA11.X(最后改回cuDNN8.1了),Python 3.9。然后会在Rstudio中通过reticulate包、tensorflow包与keras包调用它们。如果有时间,也会测试一下R语言实现的torch包,它提供了类似PyTorch的功能,直接调用libtorch。
腾讯云GPU计算型虚拟主机 GN7,搭载 NVIDIA T4 GPU,8核CPU+32G RAM+100G SSD+1颗T4,带宽5M,¥80/试用一个月,试用计划GPU实验室,入门教程。
一、从镜像安装操作系统。
不同的GPU驱动版本,可选的CUDA版本不同,要选460.106.00版。
公共镜像:Ubuntu Server 18.04.1 LTS64位
后台自动安装GPU驱动
GPU 驱动版本:460.106.00
CUDA版本: 11.2.2
cuDNN版本: 8.2.1
用户名: ubuntu
网址 :172.16.XX.XX(内)106.52.XX.XX(公)
安装完成,用SecureCRT或PuTTY连接,它的SSH服务器启用了更新的密钥交换算法,SecureCRT要升级到9.0版以上。
1、登录机器后,先启用root账户,参阅资料。设置root账户密码:
$sudo passwd root
账户切换:
$su root
#su ubuntu
# vi /etc/ssh/sshd_config
找到这一段:
# Authentication:
#LoginGraceTime 2m
#PermitRootLogin prohibit-password
#StrictModes yes
#MaxAuthTries 6
#MaxSessions 10
改成这样:
# Authentication:
#LoginGraceTime 2m
#PermitRootLogin prohibit-password
PermitRootLogin yes
StrictModes yes
#MaxAuthTries 6
#MaxSessions 10
重启SSH服务器:
# systemctl restart sshd.service
为了方便后面安装软件,关闭sudo命令的PATH限制,参阅资料,用wq!存盘:
# vi /etc/sudoers
Defaults env_reset
Defaults mail_badpass
# Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
然后用 -E选项运行sudo命令可以继承当前用户的环境变量设置,这样安装软件也可以不用登录到root用户,比如后面用conda命令安装Python软件包:
(gpu) ubuntu@VM-0-14-ubuntu:~$ sudo -E conda list hanlp
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name Version Build Channel
hanlp 2.1.0b42 pypi_0 pypi
hanlp-common 0.0.18 pypi_0 pypi
hanlp-downloader 0.0.25 pypi_0 pypi
hanlp-trie 0.0.5 pypi_0 pypi
2、大约需要10~15分钟进行安装,可以用以下命令查看当前安装进程:
root@VM-0-14-ubuntu:~# ps aux | grep -i install
root 8158 0.0 0.0 13776 1156 pts/0 S+ 08:50 0:00 grep --color=auto -i install
如上面所示,里面没有nv_driver_install.sh及nv_cuda_install.sh,则表示驱动安装已经完成。
3、验证GPU驱动安装成功。
root@VM-0-14-ubuntu:~# nvidia-smi
Sat Oct 29 08:52:11 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:08.0 Off | 0 |
| N/A 28C P8 8W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
4、验证CUDA 安装成功。上面入门教程写的不适用于这个配置组合,/usr/local/cuda是到/usr/local/cuda-11.2的链接。
root@VM-0-14-ubuntu:~# cat /usr/local/cuda/version.txt
cat: /usr/local/cuda/version.txt: No such file or directory
root@VM-0-14-ubuntu:~# find / -name cuda
/usr/local/cuda-11.2/targets/x86_64-linux/include/cuda
/usr/local/cuda-11.2/targets/x86_64-linux/include/thrust/system/cuda
/usr/local/cuda
root@VM-0-14-ubuntu:~# cd /usr/local/cuda
root@VM-0-14-ubuntu:/usr/local/cuda# ls
bin DOCS extras lib64 nsight-compute-2020.3.1 nsight-systems-2020.4.3 nvvm README share targets version.json
compute-sanitizer EULA.txt include libnvvp nsightee_plugins nvml nvvm-prev samples src tools
root@VM-0-14-ubuntu:/usr/local/cuda# cd bin
root@VM-0-14-ubuntu:/usr/local/cuda/bin# ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
5、验证cuDNN安装,上面入门教程写的同样不适用,系统从镜像安装cuDNN没有成功。
root@VM-0-14-ubuntu:/usr/local/cuda/bin# cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
cat: /usr/include/cudnn_version.h: No such file or directory
二、手工安装cuDNN,参阅资料。
cuDNN下载要登录Nvidia的网站,所以用下面的命令是不行的:
wget https://developer.nvidia.com/compute/cudnn/secure/8.5.0/local_installers/11.7/cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
1、在笔记本上下载好,再用SecureFX从SSH端口传到服务器上,解压安装。Linux上验证过的CUDA与cuDNN等的匹配关系参阅该资料。
# tar -xvf cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
# cd cudnn-linux-x86_64-8.5.0.96_cuda11-archive
# cp lib/* /usr/local/cuda/lib64/
# cp include/* /usr/local/cuda/include/
# chmod a+r /usr/local/cuda/lib64/*
# chmod a+r /usr/local/cuda/include/*
2、将CUDA目录加入全局环境变量:
# vi /etc/profile
export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-11.2
3、source /etc/profile使它生效,或者logout再login,验证cuDNN安装:
root@VM-0-14-ubuntu:/usr/local/cuda/bin# source /etc/profile
root@VM-0-14-ubuntu:/usr/local/cuda/bin# echo $PATH
/usr/local/cuda-11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
root@VM-0-14-ubuntu:/usr/local/cuda/bin# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
root@VM-0-14-ubuntu:/usr/local/cuda/bin# cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#endif /* CUDNN_VERSION_H */
三、安装Anaconda
1、下载安装Anaconda,装在/usr/local/anaconda3目录。
$ wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2022.10-Linux-x86_64.sh
$ sudo bash Anaconda3-2022.10-Linux-x86_64.sh
安装完成,选择运行 conda init:
done
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> yes
modified /usr/local/anaconda3/condabin/conda
modified /usr/local/anaconda3/bin/conda
modified /usr/local/anaconda3/bin/conda-env
no change /usr/local/anaconda3/bin/activate
no change /usr/local/anaconda3/bin/deactivate
no change /usr/local/anaconda3/etc/profile.d/conda.sh
no change /usr/local/anaconda3/etc/fish/conf.d/conda.fish
no change /usr/local/anaconda3/shell/condabin/Conda.psm1
no change /usr/local/anaconda3/shell/condabin/conda-hook.ps1
no change /usr/local/anaconda3/lib/python3.9/site-packages/xontrib/conda.xsh
no change /usr/local/anaconda3/etc/profile.d/conda.csh
modified /root/.bashrc
==> For changes to take effect, close and re-open your current shell. <==
If you'd prefer that conda's base environment not be activated on startup,
set the auto_activate_base parameter to false:
conda config --set auto_activate_base false
Thank you for installing Anaconda3!
===========================================================================
Working with Python and Jupyter is a breeze in DataSpell. It is an IDE
designed for exploratory data analysis and ML. Get better data insights
with DataSpell.
DataSpell for Anaconda is available at: https://www.anaconda.com/dataspell
编辑全局变量脚本,把设置conda环境的脚本加到最后,以便所有用户都可用。
# vi /etc/profile
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/usr/local/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/usr/local/anaconda3/etc/profile.d/conda.sh" ]; then
. "/usr/local/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/usr/local/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
运行~/.bashrc使conda base环境生效,或者logout再login。
# source ~/.bashrc
2、root安装tensorflow-gpu 2.6。
# conda create --name gpu python=3.9
# pip install ipykernel
# python -m ipykernel install --user --name gpu
# conda activate gpu
# pip install tensorflow-gpu==2.6
3、ubuntu用户测试安装。
(base) ubuntu@VM-0-14-ubuntu:~$ conda activate gpu
(gpu) ubuntu@VM-0-14-ubuntu:~$ python
Python 3.9.13 (main, Oct 13 2022, 21:15:33)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.is_built_with_cuda()
True
>>> a = tf.constant(1.)
2022-10-29 18:14:29.577429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.585025: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.585898: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.587034: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-29 18:14:29.587744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.588624: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:29.589442: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.245462: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.246301: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.247122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-29 18:14:30.247901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13803 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:08.0, compute capability: 7.5
>>> b = tf.constant(2.)
>>> print(a+b)
tf.Tensor(3.0, shape=(), dtype=float32)
>>>
四、配置Jupyter Notebook
Jupyter Notebook的安装配置要简单一点,先配起它来验证GPU环境的安装,参阅资料。
1、安装Anaconda3时base环境已经安装了Jupyter Notebook,但上面建立的虚拟环境"gpu"里面没有安装,要安装一下,先用conda activate激活环境再装。
(base) root@VM-0-14-ubuntu:~# conda activate gpu
(gpu) root@VM-0-14-ubuntu:~# conda list jupyter
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name Version Build Channel
(gpu) root@VM-0-14-ubuntu:~# conda install jupyter notebook
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /usr/local/anaconda3/envs/gpu
added / updated specs:
- jupyter
- notebook
The following packages will be downloaded:
package | build
---------------------------|-----------------
asttokens-2.0.5 | pyhd3eb1b0_0 20 KB
......
Proceed ([y]/n)? y
Downloading and Extracting Packages
soupsieve-2.3.2.post | 65 KB | ################################################################################################################################################## | 100%
......
asttokens-2.0.5 | 20 KB | ################################################################################################################################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Retrieving notices: ...working... done
2、为用户ubuntu配置Jupyter Notebook。
1)产生配置文件。
(base) ubuntu@VM-0-14-ubuntu:~$ jupyter notebook --generate-config
Writing default config to: /home/ubuntu/.jupyter/jupyter_notebook_config.py
2)产生登录口令的Hash。
(base) ubuntu@VM-0-14-ubuntu:~$ python
Python 3.9.13 (main, Aug 25 2022, 23:26:10)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from notebook.auth import passwd
>>> passwd()
Enter password:
Verify password:
'argon2:$argon2id$v=19$m=10240,t=10,p=xxxxxxxxxxxxxxxxxxx'
>>>
3、编辑配置文件,拷贝上面登录口令的Hash到配置文件。
$ vi ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.ip='*' # 就是设置所有ip皆可访问
c.NotebookApp.password = 'argon2:$argon2id$v=19$m=10240,t=10,p=xxxxxxxxxxxxxxxxxxx' # 上面复制的那个sha密文'
c.NotebookApp.open_browser = False # 禁止自动打开浏览器
c.NotebookApp.port =8888 # 端口
c.NotebookApp.notebook_dir = '/home/ubuntu/jupyternotebook' #设置Notebook启动进入的目录
4、启动Jupyter Notebook,注意要先激活使用"gpu"环境,用的是它。
(base) ubuntu@VM-0-14-ubuntu:~$ conda activate gpu
(gpu) ubuntu@VM-0-14-ubuntu:~$ conda list jupyter
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name Version Build Channel
jupyter 1.0.0 py39h06a4308_8
jupyter_client 7.3.5 py39h06a4308_0
jupyter_console 6.4.3 pyhd3eb1b0_0
jupyter_core 4.11.1 py39h06a4308_0
jupyter_server 1.18.1 py39h06a4308_0
jupyterlab 3.4.4 py39h06a4308_0
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 2.15.2 py39h06a4308_0
jupyterlab_widgets 1.0.0 pyhd3eb1b0_1
(gpu) ubuntu@VM-0-14-ubuntu:~$ jupyter notebook &
[1] 16510
(gpu) ubuntu@VM-0-14-ubuntu:~$ [W 07:53:21.094 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 2022-10-30 07:53:21.326 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'password' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'password' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'notebook_dir' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2022-10-30 07:53:21.326 LabApp] 'notebook_dir' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[I 2022-10-30 07:53:21.333 LabApp] JupyterLab extension loaded from /usr/local/anaconda3/envs/gpu/lib/python3.9/site-packages/jupyterlab
[I 2022-10-30 07:53:21.333 LabApp] JupyterLab application directory is /usr/local/anaconda3/envs/gpu/share/jupyter/lab
[I 07:53:21.337 NotebookApp] Serving notebooks from local directory: /home/ubuntu/jupyternotebook
[I 07:53:21.337 NotebookApp] Jupyter Notebook 6.4.12 is running at:
[I 07:53:21.337 NotebookApp] http://VM-0-14-ubuntu:8888/
[I 07:53:21.337 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
5、浏览器访问,输入上面设置的密码登录,然后新建一个测试的notebook测试GPU环境的安装。
import tensorflow as tf
tf.test.is_built_with_cuda()
a = tf.constant(1.)
b = tf.constant(2.)
print(a+b)
6、新建一个测试的notebook测试keras与cuDNN。
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers,optimizers, datasets
from tensorflow.keras.models import load_model
from matplotlib import pyplot as plt
import numpy as np
# 一、数据集处理
# 构建模型
(x_train_raw, y_train_raw),(x_test_raw,y_test_raw) = datasets.mnist.load_data()
print(y_train_raw[0]) # 5
print(x_train_raw.shape, y_train_raw.shape) # (60000,28,28)6万张训练集
print(x_test_raw.shape, y_test_raw.shape) # (10000,28,28)1万张测试集
num_classes = 10
y_train= keras.utils.to_categorical(y_train_raw,num_classes) # 将分类标签变为独热码(onehot)
y_test = keras.utils.to_categorical(y_test_raw,num_classes)
print(y_train[0]) # [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
# 数据可视化,看看测试的数据
plt.figure()
for i in range(9):
plt.subplot(3,3,i+1)
plt.imshow(x_train_raw[i])
plt.axis('off')
plt.show()
# 二、构建并编译全连接神经网络
# 编译全连接层
x_train = x_train_raw.reshape(60000,784) # 将28*28的图像展开成784*1的向量
x_test = x_test_raw.reshape(10000,784) # 将图像像素值归一化0~1
x_train= x_train.astype('float32')/255
x_test = x_test.astype('float32')/255
model = keras.Sequential([ # 创建模型。模型包括3个全连接层和两个RELU激活函数
layers.Dense(512,activation='relu', input_dim = 784), # 降维处理
layers.Dense(256,activation='relu'),
layers.Dense(124,activation='relu'),
layers.Dense(num_classes,activation='softmax')
])
# 三、训练网络
Optimizer = optimizers.Adam(0.001)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=Optimizer, # Adam优化器
metrics=['accuracy']
)
model.fit(x_train,y_train, # 训练集数据标签
batch_size=128, # 批大小
epochs=10, # 训练的轮次
verbose=1) # 输出日志
# 四、测试模型
score = model.evaluate(x_test,y_test,verbose=0)
print('Test loss:', score[0]) # 损失函数: 0.0853068439
print('Test accuracy:', score[1]) # 精确度: 0.9767
test_loss,test_acc = model.evaluate(x=x_test,y=y_test)
print("Test Accuracy %.2f"%test_acc) # 精确度: 0.9
# 五、保存模型
model.save('./final_DNN_mode1.h5') # 保存DNN模型
# 六、加载保存的模型
new_model = load_model('./final_DNN_mode1.h5')
new_model.summary()
# 七、CNN 模型测试 -----------------------------------------------------------------------------------------------------
# 将数据扩充维度,以适应CNN模型
X_train=x_train.reshape(60000,28,28,1)
X_test=x_test.reshape(10000,28,28,1)
# 定义卷积神经网络模型
model=keras.Sequential([ # 创建网络序列
layers.Conv2D(filters=32,kernel_size = 5,strides = (1,1), padding ='same',activation = tf.nn.relu,input_shape = (28,28,1)),
# 添加第一层卷积层和池化层
layers.MaxPool2D(pool_size=(2,2),strides = (2,2),padding = 'valid'),
# 添加第二层卷积层和泄化层
layers.Conv2D(filters=64, kernel_size = 3, strides=(1, 1),padding='same', activation = tf.nn.relu),
layers.MaxPool2D(pool_size=(2,2),strides = (2,2),padding = 'valid'),
# 添加dropout层 以减少过拟合
layers.Dropout(0.25), # 随机丢弃神经元的比例
layers.Flatten(),
# 添加两层全连接层
layers.Dense(units=128,activation = tf.nn.relu),
layers.Dropout(0.5),
layers.Dense(units=10,activation = tf.nn.softmax)
])
# 编译并训练模型
Optimizer = optimizers.Adam(0.001)
model.compile(Optimizer,loss="categorical_crossentropy",metrics=['accuracy'])
model.fit(x=X_train,y=y_train,epochs=5,batch_size=128) # 轮次为5
# 保存CNN模型
model.save('./final_CNN_model.h5')
# 加载保存的模型
new_model = load_model('./final_CNN_model.h5')
# 八、测试数据进行可视化测试
# @matplotlib.inline
def res_Visual(n):
# 参阅 https://blog.csdn.net/yiyihuazi/article/details/122323349
# keras 2.6删除了predict_classes()函数
# final_opt_a=new_model.predict_classes(X_test[0:n]) # 通过模型预测测试集
# 用下面的语句代替
predicts = new_model.predict(X_test[0:n])
final_opt_a = np.argmax(predicts, axis=1)
fig, ax = plt.subplots(nrows=int(n/5), ncols=5)
ax = ax.flatten()
print('前{}张图片预测结果为:'.format(n))
for i in range(n):
print(final_opt_a[i],end='.')
if int((i+1)%5)==0:
print('\t')
# 图片可视化展示
img = X_test[i].reshape((28,28)) # 读取每行数据,格式为Ndarry
plt.axis("off")
ax[i].imshow(img,cmap='Greys',interpolation='nearest') # 可视化
ax[i].axis("off")
print('测试集前{}张图片为:'.format(n))
res_Visual(20)
keras要降低版本到2.6.0,否则出错,参阅资料。
ImportError: cannot import name 'dtensor' from 'tensorflow.compat.v2.experimental'
(gpu) root@VM-0-14-ubuntu:~# conda list keras
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name Version Build Channel
keras 2.10.0 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
(gpu) root@VM-0-14-ubuntu:~# pip install keras==2.6
测试程序前面DNN全连接神经网络的部分通过了,后面使用cuDNN的CNN卷积神经网络部分没有通过,cuDNN8.5的版本可能过高,参阅资料。需要降回经过测试确认的8.1版。报错:
OP_REQUIRES failed at conv_ops.cc:1276 : Not found: No algorithm worked!
7、降低cuDNN版本到8.1。笔记本上下载并用SecureFX通过SSH传到服务器上,拷贝并替换cuDNN8.5的文件。
# tar -xvf cudnn-11.2-linux-x64-v8.1.1.33.tgz
# cd cuda
# cp -f lib64/* /usr/local/cuda/lib64/
# cp -f include/* /usr/local/cuda/include/
# chmod a+r /usr/local/cuda/lib64/*
# chmod a+r /usr/local/cuda/include/*
在全局环境变量中加入下面的设置,否则跑CNN测试时可能会报错说申请的内存过大,导致算法运行失败:
# vi /etc/profile
TF_GPU_ALLOCATOR=cuda_malloc_async
更新动态链接库的Cache,否则链接不对,重启系统:
# ldconfig -X
# reboot now
用ubuntu用户登录,激活"gpu"环境并启动Jupyter Notebook:
$ conda activate gpu
$ jupyter notebook &
8、重新运行刚才的notebook测试GPU环境的安装,通过。
五、安装Pytorch与HanLP
我在同一个虚拟环境"gpu"中安装Tensorflow、Pytorch与HanLP,因为要跑HanLP2.1,它同时支持两个后端。
1、安装Pytorch。
(gpu) root@VM-0-14-ubuntu:~# conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /usr/local/anaconda3/envs/gpu
added / updated specs:
- cudatoolkit=11.3
- pytorch==1.11.0
- torchaudio==0.11.0
- torchvision==0.12.0
The following packages will be downloaded:
package | build
---------------------------|-----------------
cudatoolkit-11.3.1 | h2bc3f7f_2 549.3 MB
......
torchvision pytorch/linux-64::torchvision-0.12.0-py39_cu113 None
Proceed ([y]/n)? y
Downloading and Extracting Packages
lame-3.100 | 323 KB | ################################################################################################################################################## | 100%
......
######################################################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: | By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html
done
Retrieving notices: ...working... done
2、安装HanLP。
(gpu) root@VM-0-14-ubuntu:~# pip install hanlp
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting hanlp
......
Successfully built hanlp-common hanlp-trie hanlp-downloader phrasetree
Installing collected packages: toposort, tokenizers, phrasetree, tqdm, regex, pyyaml, pynvml, hanlp-common, filelock, huggingface-hub, hanlp-trie, hanlp-downloader, transformers, hanlp
Successfully installed filelock-3.8.0 hanlp-2.1.0b42 hanlp-common-0.0.18 hanlp-downloader-0.0.25 hanlp-trie-0.0.5 huggingface-hub-0.10.1 phrasetree-0.0.8 pynvml-11.4.1 pyyaml-6.0 regex-2022.9.13 tokenizers-0.11.6 toposort-1.5 tqdm-4.64.1 transformers-4.23.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
安装fasttext,这是HanLP一些Tensorflow预训练模型要用的:
(gpu) root@VM-0-14-ubuntu:~# pip install fasttext
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting fasttext
Downloading http://mirrors.tencentyun.com/pypi/packages/f8/85/e2b368ab6d3528827b147fdb814f8189acc981a4bc2f99ab894650e05c40/fasttext-0.9.2.tar.gz (68 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 332.3 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pybind11>=2.2
Using cached http://mirrors.tencentyun.com/pypi/packages/1d/53/e6b27f3596278f9dd1d28ef1ddb344fd0cd5db98ef2179d69a2044e11897/pybind11-2.10.1-py3-none-any.whl (216 kB)
Requirement already satisfied: setuptools>=0.7.0 in /usr/local/anaconda3/envs/gpu/lib/python3.9/site-packages (from fasttext) (65.5.0)
Requirement already satisfied: numpy in /usr/local/anaconda3/envs/gpu/lib/python3.9/site-packages (from fasttext) (1.23.3)
Building wheels for collected packages: fasttext
Building wheel for fasttext (setup.py) ... done
Created wheel for fasttext: filename=fasttext-0.9.2-cp39-cp39-linux_x86_64.whl size=299146 sha256=4dee6f6dc5fb53404fb5cbb69c2cc3a2faef7f3af0500567ad49dc01f26d89d7
Stored in directory: /root/.cache/pip/wheels/ca/08/ee/d0dd871c6c089c4c3971722067bd577f8827c9b4d5d6f2477a
Successfully built fasttext
Installing collected packages: pybind11, fasttext
3、测试PyTorch及HanLP。
先简单测试下,后面会继续测试。
import torch
print(torch.__version__)
print(torch.cuda.is_available())
# 先运行Tensorflow模型再运行PyTorch模型就成功,如果前面先运行过PyTorch模型,这里就会失败。
import hanlp
tokenizer = hanlp.load(hanlp.pretrained.tok.LARGE_ALBERT_BASE)
text = 'NLP统计模型没有加规则,聪明人知道自己加。英文、数字、自定义词典统统都是规则。'
print(tokenizer(text))
# 后面的测试不受运行顺序的影响
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH) # 世界最大中文语料库
HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。'])
import hanlp
HanLP = hanlp.pipeline() \
.append(hanlp.utils.rules.split_sentence, output_key='sentences') \
.append(hanlp.load('FINE_ELECTRA_SMALL_ZH'), output_key='tok') \
.append(hanlp.load('CTB9_POS_ELECTRA_SMALL'), output_key='pos') \
.append(hanlp.load('MSRA_NER_ELECTRA_SMALL_ZH'), output_key='ner', input_key='tok') \
.append(hanlp.load('CTB9_DEP_ELECTRA_SMALL', conll=0), output_key='dep', input_key='tok')\
.append(hanlp.load('CTB9_CON_ELECTRA_SMALL'), output_key='con', input_key='tok')
HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。')
HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。').pretty_print()
import hanlp
tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)
tok(['商品和服务。', '阿婆主来到北京立方庭参观自然语义科技公司。'])
tok_fine = hanlp.load(hanlp.pretrained.tok.FINE_ELECTRA_SMALL_ZH)
tok_fine('阿婆主来到北京立方庭参观自然语义科技公司')
pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
pos(["我", "的", "希望", "是", "希望", "张晚霞", "的", "背影", "被", "晚霞", "映红", "。"])
六、安装配置JupyterHub
Linux GPU虚拟主机作为科研、开发、测试或生产环境,多用户是很自然的,Jupyter Notebook是单用户的,JupyterHub则提供了一层多用户的代理,让大家可以通过它登录系统,使用各自的Jupyter Notebook或Jupyter Lab,后者是前者的下一代版本。
根据该帖子,如果曾经运行过Jupyter Notebook,那么它在$HOME/.jupyter下的配置文件会与Jupyterhub要启动的用户Jupyter Lab或Jupyter Notebook Server冲突,导致服务进程不能启动,代理转发失败,这是个BUG?所以如果曾经运行过Jupyter Notebook,像前面那样,要先删除那个目录。这个问题搞了两天,几乎要崩溃,还是stackoverflow给力。
1、安装并升级node.js与npm。
# #从软件源获取最新软件列表,更新系统软件
# apt-get update
# apt-get upgrade
# #安装依赖
# apt install -y npm nodejs
升级node.js,不要安装最新的18版,兼容性有问题,会报错,参阅资料,JupyterHub要求版本10以上,而Ubuntu18安装的是版本8。
##----- 先清除 npm cache
# npm cache clean -f
##----- 安装 n 模块
# npm install -g n
升级node.js:
root@VM-0-14-ubuntu:~# n 16.18.0 # 指定版本16.18.0
installing : node-v16.18.0
mkdir : /usr/local/n/versions/node/16.18.0
fetch : https://nodejs.org/dist/v16.18.0/node-v16.18.0-linux-x64.tar.xz
copying : node/16.18.0
installed : v16.18.0 (with npm 8.19.2)
Note: the node command changed location and the old location may be remembered in your current shell.
old : /usr/bin/node
new : /usr/local/bin/node
If "node --version" shows the old version then start a new shell, or reset the location hash with:
hash -r (for bash, zsh, ash, dash, and ksh)
rehash (for csh and tcsh)
root@VM-0-14-ubuntu:~# hash -r
root@VM-0-14-ubuntu:~# node -v
v16.18.0
root@VM-0-14-ubuntu:~# npm -v
8.19.2
2、安装configurable-http-proxy。
可以用npm装:
npm install -g configurable-http-proxy
不过推荐用conda装,会把其它依赖包一起装上,它也会安装一个node.js版本11,也可以用,注意要切换并安装到相应的虚拟环境中,这里是"gpu"。
(gpu) root@VM-0-14-ubuntu:~# conda install configurable-http-proxy
(gpu) root@VM-0-14-ubuntu:~# conda list configurable-http-proxy
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name Version Build Channel
configurable-http-proxy 4.0.1 node6_0
(gpu) root@VM-0-14-ubuntu:~# configurable-http-proxy -V
4.0.1
(gpu) root@VM-0-14-ubuntu:~#
3、在虚拟环境中安装JupyterHub等。
(gpu) root@VM-0-14-ubuntu:~# conda install jupyter jupyterlab jupyterhub
(gpu) root@VM-0-14-ubuntu:~# conda list jupyter
# packages in environment at /usr/local/anaconda3/envs/gpu:
#
# Name Version Build Channel
jupyter 1.0.0 py39h06a4308_8
jupyter_client 7.3.5 py39h06a4308_0
jupyter_console 6.4.3 pyhd3eb1b0_0
jupyter_core 4.11.1 py39h06a4308_0
jupyter_server 1.18.1 py39h06a4308_0
jupyter_telemetry 0.1.0 py_0
jupyterhub 2.0.0 pyhd3eb1b0_0
jupyterlab 3.4.4 py39h06a4308_0
jupyterlab_pygments 0.1.2 py_0
jupyterlab_server 2.15.2 py39h06a4308_0
jupyterlab_widgets 1.0.0 pyhd3eb1b0_1
4、配置JupyterHub。
新建目录/etc/jupyterhub,在该目录下新建一个配置文件,编辑文件。
(gpu) root@VM-0-14-ubuntu:~# mkdir /etc/jupyterhub
(gpu) root@VM-0-14-ubuntu:~# cd /etc/jupyterhub
(gpu) root@VM-0-14-ubuntu:/etc/jupyterhub# jupyterhub --generate-config
Writing default config to: jupyterhub_config.py
(gpu) root@VM-0-14-ubuntu:/etc/jupyterhub# vi jupyterhub_config.py
内容如下:
# Added by Jean 2022/10/31
c.Authenticator.whitelist = {'ubuntu'} # 允许使用Jupyterhub的用户列表,逗号分隔。
c.Authenticator.admin_users = {'ubuntu'} #Jupyterhub的管理员用户列表
c.Spawner.notebook_dir = '/home/{username}' #浏览器登录后进入用户的主目录
c.Spawner.default_url = '/lab' # 使用Jupyterlab而不是Notebook
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'
5、用root用户后台启动JupyterHub。
(gpu) root@VM-0-14-ubuntu:/etc/jupyterhub# jupyterhub -f /etc/jupyterhub/jupyterhub_config.py &
6、在浏览器中访问,输入的是Linux系统中已有的用户名,网址是http://ip:8000,后面再配SSL加密。
JupyterHub里可以打开终端窗口,执行各种操作,用户的身份就是登录的用户。如果SSH端口被屏蔽,这样就可以通过HTTP端口建立隧道。执行su命令就可以root。
(base) ubuntu@VM-0-14-ubuntu:~$ su --help
Usage: su [options] [LOGIN]
Options:
-c, --command COMMAND pass COMMAND to the invoked shell
-h, --help display this help message and exit
-, -l, --login make the shell a login shell
-m, -p,
--preserve-environment do not reset environment variables, and
keep the same shell
-s, --shell SHELL use SHELL instead of the default in passwd
(base) ubuntu@VM-0-14-ubuntu:~$ su --preserve-environment
Password:
(base) root@VM-0-14-ubuntu:~#
7、配置SSL加密。
这是配好后SSL加密连接登录的截图,可以打开网址前面的锁图标看证书链的内容,前面的截图可见,如果是非加密连接,网址前面显示的是“不安全”提示。此处自签的数字证书是签给IP,因为这个虚拟主机还没有申请域名。
1)先讲讲JupyterHub配置。在配置文件中增加两行指出使用的服务器密钥文件和证书文件即可,后面再讲用openssl自建CA及签发该数字证书。因为是root用户,server.key没有指定访问密码。
# Added by Jean for SSL 2022/03/19
c.JupyterHub.ssl_key = '/root/cert/server.key'
c.JupyterHub.ssl_cert = '/root/cert/server.crt'
重启JupyterHub后,把自建CA的根证书拷出并导入浏览器(后面讲),用https://ip:8000访问即可,如上图所示。
2)自建CA签发自签服务器证书。
参阅资料。
(gpu) root@VM-0-14-ubuntu:~# cd /root
(gpu) root@VM-0-14-ubuntu:~# mkdir cert
(gpu) root@VM-0-14-ubuntu:~# cd cert
(gpu) root@VM-0-14-ubuntu:~/cert# mkdir demoCA && cd demoCA
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# mkdir private newcerts
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# touch index.txt
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# echo '01' > serial
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# cd private
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# openssl genrsa -out cakey.pem 2048
Generating RSA private key, 2048 bit long modulus (2 primes)
...............................................................................+++++
....................+++++
e is 65537 (0x010001)
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# openssl req -sha256 -new -x509 -days 3650 -key cakey.pem -out cacert.pem \
> -subj "/C=CN/ST=GD/L=ZhuHai/O=Jean/OU=Study/CN=RootCA"
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# ls
cacert.pem cakey.pem
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA/private# cd .. && mv ./private/cacert.pem ./
(gpu) root@VM-0-14-ubuntu:~/cert/demoCA# ls
cacert.pem index.txt newcerts private serial
上面的命令执行了一系列的操作:
A、在root用户的HOME目录/root下新建了/root/cert目录。
B、然后在其下建立了自建CA的目录结构./demoCA,因为openssl默认的配置文件中,建在当前目录的./demoCA目录下。
C、然后产生了CA的密钥cakey.pem。
D、签发了CA的自签数字证书cacert.pem,然后移动到./demoCA目录下。后面自建CA签发服务器证书时会到那里找CA根证书,这是openssl默认的配置。
E、最后列出了demoCA的目录结构。
可以找出openssl默认的配置文件看一下,自建CA在当前目录的./demoCA目录下:
(gpu) root@VM-0-14-ubuntu:~# find / -name openssl.cnf
/usr/lib/ssl/openssl.cnf
/usr/local/anaconda3/pkgs/openssl-1.1.1q-h7f8727e_0/ssl/openssl.cnf
/usr/local/anaconda3/ssl/openssl.cnf
/usr/local/anaconda3/envs/gpu/ssl/openssl.cnf
/usr/local/anaconda3/envs/hub/ssl/openssl.cnf
/etc/ssl/openssl.cnf
(gpu) root@VM-0-14-ubuntu:~# vi /usr/lib/ssl/openssl.cnf
####################################################################
[ ca ]
default_ca = CA_default # The default ca section
####################################################################
[ CA_default ]
dir = ./demoCA # Where everything is kept
certs = $dir/certs # Where the issued certs are kept
crl_dir = $dir/crl # Where the issued crl are kept
database = $dir/index.txt # database index file.
#unique_subject = no # Set to 'no' to allow creation of
# several certs with same subject.
new_certs_dir = $dir/newcerts # default place for new certs.
certificate = $dir/cacert.pem # The CA certificate
serial = $dir/serial # The current serial number
crlnumber = $dir/crlnumber # the current crl number
# must be commented out to leave a V1 CRL
crl = $dir/crl.pem # The current CRL
private_key = $dir/private/cakey.pem# The private key
RANDFILE = $dir/private/.rand # private random number file
x509_extensions = usr_cert # The extensions to add to the cert
F、生成服务器证书的密钥与证书请求。
参考帖子1与帖子2,要先执行下面的命令产生/root/.rnd文件,否则产生服务器密钥的命令会出错。
openssl rand -out /root/.rnd -hex 256
切换到./demoCA的父目录/root/cert,然后执行下面的命令产生服务器证书的密钥与证书请求,产生证书请求用配置文件/usr/lib/ssl/openssl.cnf,额外增加了认证的主体别名,Chrome浏览器使用主体别名来检查证书的主体别名与网址是否一致。因为用https://ip访问,这里的主体别名为IP.1:106.52.33.185,表示是该证书认证的第一个IP,还可以有IP.2等等。如果是认证域名,可以是DNS.1 = jeanye.cn等等,如此类推。产生证书请求文件server.csr。
(gpu) root@VM-0-14-ubuntu:~/cert# openssl genrsa -out server.key 2048
(gpu) root@VM-0-14-ubuntu:~/cert# openssl req -new \
> -sha256 \
> -key server.key \
> -subj "/C=CN/ST=GD/L=ZhuHai/O=Jean/OU=Study/CN=106.52.33.185" \
> -reqexts SAN \
> -config <(cat /usr/lib/ssl/openssl.cnf \
> <(printf "[SAN]\nsubjectAltName=IP.1:106.52.33.185")) \
> -out server.csr
G、签署服务器证书。
openssl会在默认子目录./demoCA中找到cakey.pem与cacert.pem,按照证书请求文件server.csr的请求,使用配置文件/usr/lib/ssl/openssl.cnf,以及与请求一样的证书扩展(主体别名)签署证书,输出成server.crt。
(gpu) root@VM-0-14-ubuntu:~/cert# openssl ca -in server.csr \
> -md sha256 \
> -extensions SAN \
> -config <(cat /usr/lib/ssl/openssl.cnf \
> <(printf "[SAN]\nsubjectAltName=IP.1:106.52.33.185")) \
> -out server.crt
Using configuration from /dev/fd/63
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 1 (0x1)
Validity
Not Before: Nov 2 09:47:58 2022 GMT
Not After : Nov 2 09:47:58 2023 GMT
Subject:
countryName = CN
stateOrProvinceName = GD
organizationName = Jean
organizationalUnitName = Study
commonName = 106.52.33.185
X509v3 extensions:
X509v3 Subject Alternative Name:
IP Address:106.52.33.185
Certificate is to be certified until Nov 2 09:47:58 2023 GMT (365 days)
Sign the certificate? [y/n]:y
1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated
(gpu) root@VM-0-14-ubuntu:~/cert# ls
demoCA server.crt server.csr server.key
H、自建CA根证书导入浏览器。
把自建CA的根证书/root/cert/demoCA/cacert.pem下载到客户端(比如Win10),在浏览器(比如Chrome)中导入到受信任根证书颁证机构中。
Google浏览器:
设置->隐私设置和安全性->安全->高级->管理证书->受信任根证书颁证机构->导入->下一步->浏览->所有文件(*.*)
I、浏览器中输入网址https://106.52.33.185:8000访问,输入用户名/密码登录。
8、配置JupyterHub为开机自启动服务。
1)建立服务配置文件。
先看看conda虚拟环境"gpu"的PATH设置:
(gpu) root@VM-0-14-ubuntu:~# echo $PATH
/usr/local/anaconda3/envs/gpu/bin:/usr/local/anaconda3/condabin:/usr/local/cuda-11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
(gpu) root@VM-0-14-ubuntu:~#
然后新建一个系统守护进程的配置文件:
(gpu) root@VM-0-14-ubuntu:~# vi /etc/systemd/system/jupyterhub.service
内容如下,几个要点。
A、以root运行。
B、设定PATH路径,因为开机启动进程没有登录的过程,不会执行/etc/profile等设置环境变量,把上面的PATH拷进去。
C、用全路径引用执行jupyterhub。
[Unit]
Description=Jupyterhub service
After=syslog.target network.target
[Service]
User=root
Environment="PATH=/usr/local/anaconda3/envs/gpu/bin:/usr/local/anaconda3/condabin:/usr/local/cuda-11.2/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
ExecStart=/usr/local/anaconda3/envs/gpu/bin/jupyterhub -f /etc/jupyterhub/config.py
[Install]
WantedBy=multi-user.target
然后让服务配置文件生效:
(gpu) root@VM-0-14-ubuntu:~# systemctl enable jupyterhub.service
然后可以用下面几个命令来管理服务:
# systemctl status jupyterhub.service
# systemctl start jupyterhub.service
# systemctl stop jupyterhub.service
用下面的命令来查看服务的日志:
(gpu) root@VM-0-14-ubuntu:~# journalctl -u jupyterhub.service -f
上面Jupyterhub的配置文件中,日志也另外输出到以下的文件:
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'
所以也可以打开日志文件来看。
这样,每次服务器重启,Jupyterhub都会自动启动了。
本篇到此结束,Linux GPU虚拟主机与GPU、Python深度学习运行与开发环境相关的部分就配好了,Rstudio、Shiny等其它部分另起文章。