Linux服务器硬件检测流程

使用sysbench进行性能测试,安装apt install -y sysbench教程

目录

  • 操作系统
  • cpu
    • 查看主频,核数和线程数
    • 性能测试前的准备
    • 性能测试
  • GPU
    • 查看大小,型号,驱动是否安装正确
    • 性能测试
  • 内存
    • 查看大小
    • 吞吐量
  • 磁盘
    • 查看大小
    • IO性能
  • 交换机-集群
    • 所以节点之间的连通性
    • 网速

操作系统

$ lsb_release -a

CPU

查看主频,核数,线程数

Socket芯片卡槽数,Core(s) per socket每一块芯片有多少核心,Thread(s) per core每个核心支持几个线程,即是否使用超线程技术

CPU数则为Socket * Core(s) per socket * Thread(s) per core

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   112
  On-line CPU(s) list:    0-111
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
    CPU family:           6
    Model:                106
    Thread(s) per core:   2
    Core(s) per socket:   28
    Socket(s):            2
    Stepping:             6
    CPU max MHz:          3100.0000
    CPU min MHz:          800.0000
    BogoMIPS:             4000.00

性能测试前的准备

/sys/devices/system/cpu/cpu*/cpufreq文件夹里有每个CPU的配置和信息,*代表CPU编号(0~N-1)。可用cat查看每个文件

/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor中为CPU频率调节器的类型,可用如下命令改变模式

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

有如下几种选择:

  • performance:将CPU频率设置为最高值,以提供最佳性能。适合需要高响应速度和处理能力的场景,但可能会增加功耗和热量。
  • powersave:将CPU频率设置为最低值,以节省电力。适合电池供电设备或对功耗敏感的场景。
  • userspace:允许用户空间程序通过写入scaling_setspeed属性来设置CPU频率。
  • ondemand:根据当前系统负载动态调整CPU频率。当负载增加时,频率会提高以提供更好的性能,而在轻负载时频率会降低以节省电力。
  • conservative:类似于ondemand,但频率调整更加平缓,不会立即跳到最高频率。适合需要平衡性能和功耗的场景。
  • schedutil:基于CPU调度器的利用率数据来动态调整频率。它是较新的调节器,通常被认为是ondemand和conservative的替代品,因为它与CPU调度器更紧密集成,开销更小。

/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq中为CPU频率变化阈值上限

/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq中为CPU频率变化阈值下限

/sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq中为当前CPU频率

另外可以开一个窗口持续执行watch -n 1 "cat /proc/cpuinfo | grep 'MHz'"来监控当前CPU频率

/sys/devices/system/cpu/cpu*/cpufreq/base_frequency中为一个频率基准值

scaling_governorperformance时,若有base_frequency,则CPU频率不会升高到scaling_max_freq而是会维持在base_frequency,同理,当scaling_governorpowersave时,若有base_frequency,则CPU频率不会下降到scaling_min_freq而是会维持在base_frequency

/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq中为CPU信息中的频率上限,对应lscpu中的CPU max MHz

/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_min_freq中为CPU信息中的频率下限,对应lscpu中的CPU min MHz

性能测试

单核性能测试

$ sysbench --test=cpu --cpu-max-prime=20000 --time=30 run

多核性能测试

$ sysbench --test=cpu --cpu-max-prime=20000 --threads=112 --time=30 run

结果会包含每秒任务数,任务耗时,线程均衡性

CPU speed:
    events per second: 50015.22

General statistics:
    total time:                          30.0023s
    total number of events:              1500650

Latency (ms):
         min:                                    0.98
         avg:                                    2.24
         max:                                   26.24
         95th percentile:                        2.26
         sum:                              3358727.27

Threads fairness:
    events (avg/stddev):           13398.6607/139.68
    execution time (avg/stddev):   29.9886/0.01

GPU

查看大小,型号,驱动是否安装正确

Nvidia的显卡可以如下查看,Perf下的部分就是型号,Memory-Usage下的部分就是显卡内存

$ nvidia-smi
Tue Feb 18 06:41:55 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          Off |   00000000:52:00.0 Off |                    0 |
| N/A   44C    P0             62W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          Off |   00000000:56:00.0 Off |                    0 |
| N/A   45C    P0             68W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100 80GB PCIe          Off |   00000000:D1:00.0 Off |                    0 |
| N/A   41C    P0             66W /  300W |       1MiB /  81920MiB |      2%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100 80GB PCIe          Off |   00000000:D5:00.0 Off |                    0 |
| N/A   42C    P0             65W /  300W |       1MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

单卡/多卡性能

使用gpu_burn进行测试,官方仓库

但官方给的加-参数似乎都不好使

$ cd /home/hx/gpu_burn
$ ./gpu_burn 100
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-751c75f1-b612-d705-c571-9173e4969f8b)
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-569f62c0-3b97-4d25-a7fc-70b2a2724478)
GPU 2: NVIDIA A100 80GB PCIe (UUID: GPU-2be55775-2295-c096-411d-4f28a4b50ec4)
GPU 3: NVIDIA A100 80GB PCIe (UUID: GPU-4f6920dc-f153-925e-a803-181dd91a232f)
Initialized device 0 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
Initialized device 3 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
Initialized device 2 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
Initialized device 1 with 81037 MB of memory (80602 MB available, using 72542 MB of it), using FLOATS
11.0%  proc'd: 9062 (17018 Gflop/s) - 9062 (16879 Gflop/s) - 9062 (17042 Gflop/s) - 4531 (14077 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 72 C - 73 C - 68 C - 81 C
        Summary at:   Wed Feb 19 04:23:49 AM UTC 2025

24.0%  proc'd: 22655 (16849 Gflop/s) - 18124 (16789 Gflop/s) - 18124 (16967 Gflop/s) - 18124 (15897 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 80 C - 79 C - 75 C - 85 C
        Summary at:   Wed Feb 19 04:24:02 AM UTC 2025

36.0%  proc'd: 31717 (16763 Gflop/s) - 31717 (16672 Gflop/s) - 31717 (16856 Gflop/s) - 27186 (14160 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 83 C - 82 C - 78 C - 84 C
        Summary at:   Wed Feb 19 04:24:14 AM UTC 2025

47.0%  proc'd: 40779 (15897 Gflop/s) - 40779 (16435 Gflop/s) - 45310 (16754 Gflop/s) - 36248 (12967 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 85 C - 84 C - 82 C - 85 C
        Summary at:   Wed Feb 19 04:24:25 AM UTC 2025

58.0%  proc'd: 49841 (14627 Gflop/s) - 54372 (14800 Gflop/s) - 54372 (16656 Gflop/s) - 45310 (11643 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 85 C - 85 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:24:36 AM UTC 2025

69.0%  proc'd: 58903 (13925 Gflop/s) - 63434 (14173 Gflop/s) - 63434 (15784 Gflop/s) - 49841 (10948 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:24:47 AM UTC 2025

80.0%  proc'd: 67965 (13543 Gflop/s) - 72496 (13905 Gflop/s) - 72496 (15110 Gflop/s) - 58903 (10935 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:24:58 AM UTC 2025

91.0%  proc'd: 77027 (13270 Gflop/s) - 77027 (13868 Gflop/s) - 81558 (14820 Gflop/s) - 63434 (10594 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
        Summary at:   Wed Feb 19 04:25:09 AM UTC 2025

100.0%  proc'd: 86089 (13270 Gflop/s) - 86089 (13676 Gflop/s) - 90620 (14577 Gflop/s) - 72496 (10184 Gflop/s)   errors: 0 - 0 - 0 - 0   temps: 84 C - 84 C - 84 C - 84 C
Killing processes.. done

Tested 4 GPUs:
        GPU 0: OK
        GPU 1: OK
        GPU 2: OK
        GPU 3: OK

内存

查看大小

$ sudo lshw -C memory
  *-memory
       description: System Memory
       physical id: 4f
       slot: System board or motherboard
       size: 256GiB
$ free -h
               total        used        free      shared  buff/cache   available
Mem:           251Gi       1.0Gi       247Gi       2.0Mi       2.9Gi       249Gi
Swap:          8.0Gi          0B       8.0Gi

吞吐量

多线程随机写入效率

$ sysbench memory --memory-block-size=1M --memory-total-size=200G --threads=50 --memory-access-mode=rnd run

多线程随机读取效率

$ sysbench memory --memory-block-size=1M --memory-total-size=200G --memory-access-mode=rnd --threads=50 --memory-oper=read run

磁盘

查看大小,分区合理性

lsblkfdiskdf -h等命令查看到的1GB=1024MB换算来的容量,而硬盘厂商一般用1GB=1000MB换算,因此容量看上去会比预期的少许多,只有用parted能看到符合容量标注的大小

ROTA值为1为HDD,0为SSD)

$ lsblk --output NAME,ROTA,SIZE,TYPE,RM,RO,MOUNTPOINTS
nvme0n1                      0  3.5T disk  0  0
├─nvme0n1p1                  0    1G part  0  0 /boot/efi
├─nvme0n1p2                  0    2G part  0  0 /boot
└─nvme0n1p3                  0  3.5T part  0  0
  └─ubuntu--vg-ubuntu--lv    0  100G lvm   0  0 /
$ sudo parted /dev/nvme0n1 print
[sudo] password for hx:
Model: SAMSUNG MZQL23T8HCLS-00A07 (nvme)
Disk /dev/nvme0n1: 3841GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  1128MB  1127MB  fat32              boot, esp
 2      1128MB  3276MB  2147MB  ext4
 3      3276MB  3841GB  3837GB

IO性能

sysbench评估磁盘读写需要先prepare准备数据,然后run,测试完后cleanup清理测试数据

多线程随机写人测试

$ sysbench fileio --file-total-size=25G --file-test-mode=rndwr --threads=10 --file-num=100 prepare
$ sysbench fileio --file-total-size=25G --file-test-mode=rndwr --threads=10 --file-num=100 --report-interval=1 run
$ sysbench fileio --file-total-size=25G --file-test-mode=rndwr --threads=10 --file-num=100 cleanup

多线程随机读取测试

$ sysbench fileio --file-total-size=25G --file-test-mode=rndrd --threads=10 --file-num=100 prepare
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrd --threads=10 --file-num=100 --report-interval=1 run
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrd --threads=10 --file-num=100 cleanup

多线程随机读写混合,读写比6:4

$ sysbench fileio --file-total-size=25G --file-test-mode=rndrw --threads=10 --file-num=100 --file-rw-ratio=1.5 prepare
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrw --threads=10 --file-num=100 --file-rw-ratio=1.5 --report-interval=1 run
$ sysbench fileio --file-total-size=25G --file-test-mode=rndrw --threads=10 --file-num=100 --file-rw-ratio=1.5 cleanup

交换机-集群

网络连通性

网速

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容