预备工作
这里需要下载2个文件,一个文件用于HOST服务器,一个文件用于Bluefield DPU。注意根据系统的不同选择对应的正确的文件。
笔者Ubuntu 22.04,下载了下述两个文件。注意下载下述两个文件时务必要登录nvidia账号,并获得DOCA SDK early Access后再下载。
https://docs.nvidia.com/doca/sdk/nvidia+doca+installation+guide+for+linux/index.html#src-2448907425_NVIDIADOCAInstallationGuideforLinux-BlueFieldDPUImageInstallation
# ls -l
total 1612060
-rw-r--r-- 1 yanghy cord-testdrive-P 1178457336 Jan 9 01:46 DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.23-10.prod.bfb
-rw-r--r-- 1 yanghy cord-testdrive-P 470566450 Jan 9 01:32 doca-host-repo-ubuntu2204_2.5.0-0.0.1.2.5.0108.1.23.10.1.1.9.0_amd64.deb
Host初始化
# sudo dpkg -i doca-host-repo-ubuntu2204_2.5.0-0.0.1.2.5.0108.1.23.10.1.1.9.0_amd64.deb
# sudo apt-get update
# sudo apt install -y IPdoca-runtime doca-sdk doca-tools
最后一步视网络质量,可能时间漫长。
检视DPU信息
# mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
Unloading MST PCI module (unused) - Success
# mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
BlueField2(rev:0) /dev/mst/mt41686_pciconf0.1 81:00.1 mlx5_3 net-enp129s0f1np1 1
BlueField2(rev:0) /dev/mst/mt41686_pciconf0 81:00.0 mlx5_2 net-enp129s0f0np0 1
ConnectX5(rev:0) /dev/mst/mt4119_pciconf0.1 63:00.1 mlx5_1 net-eno34np1 0
ConnectX5(rev:0) /dev/mst/mt4119_pciconf0 63:00.0 mlx5_0 net-eno33np0 0
重置设备(可选)
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -y reset
调整Bluefield工作模式
如果持有的Bluefield既支持IB又支持Ethernet,注意调整其工作模式成Ethernet。
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -e q | grep -i link_type
如果上述存在下述输出,说明当前DPU既支持IB又支持ETH。如果无任何输出,说明该DPU仅支持ETH。
Configurations: Default Current Next Boot
* LINK_TYPE_P1 IB(1) ETH(2) IB(1)
* LINK_TYPE_P2 IB(1) ETH(2) IB(1)
通过下述命令修改Bluefield工作模式
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 s LINK_TYPE_P1=2 LINK_TYPE_P2=2
设置tmfifo网卡
通过前述的一系列操作,Host层面出现tmfifo_net0网卡。
# ip a
...
8: tmfifo_net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether 00:1a:ca:ff:ff:02 brd ff:ff:ff:ff:ff:ff
inet6 fe80::21a:caff:feff:ff02/64 scope link
valid_lft forever preferred_lft forever
通过设置tmfifo_net0网卡IP为192.168.100.1/30,使得Host能通过SSH连接至DPU内默认IP 192.168.100.2。
# ifconfig tmfifo_net0 192.168.100.1/30
# ip a
...
8: tmfifo_net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether 00:1a:ca:ff:ff:02 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.1/30 brd 192.168.100.3 scope global tmfifo_net0
valid_lft forever preferred_lft forever
inet6 fe80::21a:caff:feff:ff02/64 scope link
valid_lft forever preferred_lft forever
至此Host上的相关设置已完成。
DPU初始化
生成Bluefield配置文件
Bluefiled中ARM系统将独立运行Ubuntu操作系统,此步骤用于在为Bluefiled中安装系统时注入SSH用户ubuntu登录用密码信息。
# openssl passwd -1
Password:
Verifying - Password:
$1$bNJw9LAD$xD95f3AHMX3ek0s5gWIEU.
将生成的密码hash以下述形式写入新建的bf.cfg文件。
ubuntu_PASSWORD='$1$bNJw9LAD$xD95f3AHMX3ek0s5gWIEU.'
更新Bluefiled镜像
Bluefield的镜像安装将通过rshim接口实施。
查看rshim服务状态,若未启动则启动。查看DPU的rshim接口,如果本机只有一张Bluefield,那默认为rshim0.
# sudo systemctl status rshim
# sudo systemctl enable rshim
# sudo systemctl start rshim
# ls -la /dev/ | grep rshim
drwxr-xr-x 2 root root 120 Jan 9 01:36 rshim0
根据下载的镜像、密码hash、rshim接口,给DPU安装系统并注入登录用的密码信息。
# sudo bfb-install --rshim rshim0 --bfb DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.23-10.prod.bfb --config bf.cfg
上述过程要持续十数分钟。可以通过rshim concole观察安装进展。
# apt-get install minicom -y
# minicom -D /dev/rshim0/console
当 bfb-install 完整输出下述日志时代表安装成功。
Warn: 'pv' command not found. Continue without showing BFB progress.
Pushing bfb + cfg
Collecting BlueField booting status. Press Ctrl+C to stop…
INFO[BL2]: start
INFO[BL2]: boot mode (rshim)
INFO[BL2]: DDR POST passed
INFO[BL2]: UEFI loaded
INFO[BL31]: start
INFO[BL31]: lifecycle GA Non-Secured
INFO[BL31]: runtime
INFO[UEFI]: UPVS valid
WARN[UEFI]: UPVS full
WARN[UEFI]: UPVS reclaim
WARN[UEFI]: Var reclaim
WARN[UEFI]: Var reclaim done
INFO[UEFI]: eMMC init
INFO[UEFI]: eMMC probed
INFO[UEFI]: PMI: updates started
INFO[UEFI]: PMI: total updates: 1
INFO[UEFI]: PMI: updates completed, status 0
INFO[UEFI]: PCIe enum start
INFO[UEFI]: PCIe enum end
INFO[UEFI]: exit Boot Service
INFO[MISC]: : Found bf.cfg
INFO[MISC]: : Ubuntu installation started
INFO[MISC]: Installing OS image
INFO[MISC]: : Changing the default password for user ubuntu
INFO[MISC]: : Installation finished
注意在安装成功后,DPU会自动重启,无法在第一时间被ssh。请等待三四分钟后根据前述设置的ubuntu密码通过下述命令进行ssh。如果ssh不成功,可能是DPU内部的tmfifo_net0没有up或者没有自动配上192.168.100.2这个地址。可以通过前述rshim进入DPU后手动配置tmfifo_net0网卡。
# ssh ubuntu@192.168.100.2
The authenticity of host '192.168.100.2 (192.168.100.2)' can't be established.
ED25519 key fingerprint is SHA256:RyTOWjhQM7FV0+kHicc0B7VCVuwGUEMOYH+LExG0Ik0.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.100.2' (ED25519) to the list of known hosts.
ubuntu@192.168.100.2's password:
Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 5.15.0-1032-bluefield aarch64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Tue Sep 19 16:59:20 UTC 2023
System load: 2.64501953125 Processes: 257
Usage of /: 38.6% of 14.22GB Users logged in: 0
Memory usage: 7% IPv4 address for tmfifo_net0: 192.168.100.2
Swap usage: 0%
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
ubuntu@localhost:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: oob_net0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether 0c:42:a1:a4:89:96 brd ff:ff:ff:ff:ff:ff
altname enamlnxbf17i0
3: tmfifo_net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:1a:ca:ff:ff:01 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.2/30 brd 192.168.100.3 scope global noprefixroute tmfifo_net0
valid_lft forever preferred_lft forever
inet6 fe80::21a:caff:feff:ff01/64 scope link
valid_lft forever preferred_lft forever
4: p0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master ovs-system state DOWN group default qlen 1000
link/ether 0c:42:a1:a4:89:90 brd ff:ff:ff:ff:ff:ff
altname enp3s0f0np0
5: p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 0c:42:a1:a4:89:91 brd ff:ff:ff:ff:ff:ff
altname enp3s0f1np1
6: pf0hpf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
link/ether f6:a6:b9:d3:78:99 brd ff:ff:ff:ff:ff:ff
altname enp3s0f0nc1pf0
inet6 fe80::f4a6:b9ff:fed3:7899/64 scope link
valid_lft forever preferred_lft forever
7: pf1hpf: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether d2:86:90:de:e2:6d brd ff:ff:ff:ff:ff:ff
altname enp3s0f1nc1pf1
inet6 fe80::d086:90ff:fede:e26d/64 scope link
valid_lft forever preferred_lft forever
8: en3f0pf0sf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
link/ether be:c9:5f:88:b2:32 brd ff:ff:ff:ff:ff:ff
altname enp3s0f0npf0sf0
9: enp3s0f0s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 02:1a:c9:f8:e7:d1 brd ff:ff:ff:ff:ff:ff
inet6 fe80::1a:c9ff:fef8:e7d1/64 scope link
valid_lft forever preferred_lft forever
11: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 0e:99:20:5c:fb:6a brd ff:ff:ff:ff:ff:ff
12: ovsbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 0c:42:a1:a4:89:90 brd ff:ff:ff:ff:ff:ff
inet6 fe80::e42:a1ff:fea4:8990/64 scope link
valid_lft forever preferred_lft forever
如果进入DPU内后ip a出现的网卡数量名称(形如下方)与上方示例相差交大的话,说明当前Bluefield工作在Separated模式(NIC模式)下,请参考”修改Bluefield工作模式“一节做出修改,将其工作模式修改成DPU模式(SmartNIC模式)。
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: oob_net0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 00:1a:ca:ff:ff:03 brd ff:ff:ff:ff:ff:ff
altname enamlnxbf17i0
3: tmfifo_net0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:1a:ca:ff:ff:03 brd ff:ff:ff:ff:ff:ff
inet 192.168.100.2/24 brd 192.168.100.255 scope global tmfifo_net0
valid_lft forever preferred_lft forever
inet6 fe80::21a:caff:feff:ff03/64 scope link
valid_lft forever preferred_lft forever
4: p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 0c:42:a1:a4:8a:50 brd ff:ff:ff:ff:ff:ff
altname enp3s0f0np0
inet6 fe80::e42:a1ff:fea4:8a50/64 scope link
valid_lft forever preferred_lft forever
5: p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 0c:42:a1:a4:8a:51 brd ff:ff:ff:ff:ff:ff
altname enp3s0f1np1
inet6 fe80::e42:a1ff:fea4:8a51/64 scope link
valid_lft forever preferred_lft forever
更新Bluefield固件
在DPU内执行下述命令,更新DPU固件。(如果是OEM的DPU卡,可能此环节会失败)
# sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl --force-fw-update
通过下述任意一种方式使更新生效。
1.Host和DPU上执行
# sudo mst start
2.DPU上执行
# sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 --sync 1 -y reset
如果此步骤失败,则使用步骤3
3.重启Host
至此DOCA的安装过程结束。
修改Bluefield工作模式
如果出现SSH进DPU内部发现没有 hpf、sf后缀的设备,也没有出现ovsbr前缀的网桥,说明该DPU的工作模式不正确。此时需要根据本节内容修改Bluefield工作模式。
检查工作模式
检查DPU工作模式,如果Current为SEPARATED_HOST,说明DPU工作在NIC模式下,请将其修改为DPU模式。
# sudo mlxconfig -d /dev/mst/mt41686_pciconf0 -e q
Device #1:
----------
Device type: BlueField2
Name: MBF2H516A-CENO_Ax
Description: Bluefield-2 SmartNIC 100GbE Dual-Port QSFP56; PCIe Gen4 x16; Crypto Disabled; 16GB on-board DDR; 1GbE OOB management; FHHL
Device: /dev/mst/mt41686_pciconf0
Configurations: Default Current Next Boot
Array[0..7]
* INTERNAL_CPU_MODEL EMBEDDED_CPU(1) SEPARATED_HOST(0) SEPARATED_HOST(0)
修改工作模式
修改工作模式为DPU模式。
# mlxconfig -d /dev/mst/mt41686_pciconf0 s INTERNAL_CPU_MODEL=1
注意,修改完成后,按官方指南中的重启操作不生效的话,请在修改完工作模式后,使用下述命令,然后再重启。
# sudo mlxfwreset -d /dev/mst/mt41686_pciconf0 --sync 1 -y reset
HBN部署
HBN的部署与DPU初始化环节类似,需要写入相关的配置文件及镜像。
生成Bluefield配置文件
在前述的DPU初始化环节,bf.cfg文件中写入下述内容。因为‘HBN requires service function chaining (SFC) to be activated on the DPU before running the HBN service container‘.所以必须开启SFC。
ubuntu_PASSWORD='$1$bNJw9LAD$xD95f3AHMX3ek0s5gWIEU.'
ENABLE_SFC_HBN=yes
NUM_VFs_PHYS_PORT0=4 # <num VFs supported by HBN on Physical Port 0> (valid range: 0-127) Default 14
NUM_VFs_PHYS_PORT1=4 # <num VFs supported by HBN on Physical Port 1> (valid range: 0-127) Default 0
写入镜像及配置文件
# sudo bfb-install --rshim rshim0 --bfb DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.23-10.prod.bfb --config bf.cfg
写入过程与DPU初始化过程类似。等写入成功后ssh进入Bluefield。
配置互联网连接
后续步骤需DPU能连接互联网下载资源。注意调整DPU上的默认路由设置、DNS设置。如果DPU内OOB接口无法连接外网网络,可以通过tmfifo_net0接口以及host上设置NAT的形式,实现DPU连接互联网。
期间注意systemctl stop systemd-resolved,否则DNS配置文件会不停被覆盖为空。
DPU内下载NGC资源
# wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/doca/doca_container_configs/versions/2.5.0v1/zip -O doca_container_configs_2.5.0v1.zip
# unzip -o doca_container_configs_2.5.0v1.zip -d doca_container_configs_2.5.0v1
运行HBN预备脚本
进入解压后的下述路径,运行 hbn-dpu-setup.sh脚本。该脚本会自动设置Bluefiled进入embedded模式、移除无用的OVS配置、开启ip forward、设置大页内存。运行该脚本后,根据最终提示确定是否需要重启。
# cd doca_container_configs_2.5.0v1/scripts/doca_hbn/2.0.0/
# chmod +x hbn-dpu-setup.sh
# ./hbn-dpu-setup.sh
Created symlink /etc/systemd/system/multi-user.target.wants/rc-bf2-local.service → /lib/systemd/system/rc-bf2-local.service.
HBN setup done for SFC ECMP configuration
Please reboot DPU to setup SFC ECMP Settings
# reboot
添加网络配置文件
/etc/network/interfaces
/etc/frr/frr.conf
/etc/frr/daemons
下载hbn容器yaml文件
下载的doca_hbn.yaml与/doca_container_configs_2.5.0v1/configs/2.5.0/doca_hbn.yaml一致。可以直接使用压缩包中自带的yaml。
# wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/doca/doca_container_configs/versions/2.5.0v1/files/configs/2.5.0/doca_hbn.yaml
启动HBN容器
将doca_hbn.yaml拷贝至/etc/kubelet.d/路径下,可实现HBN容器的启动。
# ls /etc/kubelet.d/
doca_telemetry_standalone.yaml
# cp doca_hbn.yaml /etc/kubelet.d/
# ls /etc/kubelet.d/
doca_hbn.yaml doca_telemetry_standalone.yaml
# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
0d16f3aa0bcd5 2 seconds ago Ready doca-hbn-service-localhost default 0 (default)
# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
fa41ae266a098 1e47671bf21db 12 seconds ago Running doca-hbn 0 0d16f3aa0bcd5 doca-hbn-service-localhost
实际测试中会出现yaml文件复制进/etc/kubelet.d/路径下后,crictl ps 无法查看到container在运行中,此时可以先将相关image pull下后,再重复上述启动容器的过程。
# docker pull nvcr.io/nvidia/doca/doca_hbn:2.0.0-doca2.5.0
# crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/library/doca_telemetry 1.15.5-doca2.5.0 095a5833a3f80 387MB
k8s.gcr.io/pause 3.2 2a060e2e7101d 487kB
nvcr.io/nvidia/doca/doca_hbn 2.0.0-doca2.5.0 1e47671bf21db 286MB
至此HBN的安装已完成。