blktrace简单介绍

blktrace的软件包。

[root@localhost ~]# yum provides blktrace
blktrace-1.2.0-10.el8.x86_64 : Utilities for performing block layer IO tracing in the Linux kernel
Repo        : @System
Matched from:
Provide    : blktrace = 1.2.0-10.el8

blktrace-1.2.0-10.el8.x86_64 : Utilities for performing block layer IO tracing in the Linux kernel
Repo        : BaseOS
Matched from:
Provide    : blktrace = 1.2.0-10.el8

blktrace的作用

blktrace是一种块层IO跟踪机制,它提供了关于请求队列操作的详细信息,直至用户空间。有三个主要组件:内核组件、记录内核到用户空间的i/o跟踪信息的实用程序,以及分析和查看跟踪信息的实用程序。此手册页描述了blktrace,它将特定块设备的i/o事件跟踪信息记录到文件中。

DESCRIPTION
       blktrace is a block layer IO tracing mechanism which provides detailed information about request queue operations up to user space. There are three major components: a kernel component, a utility
       to record the i/o trace information for the kernel to user space, and utilities to analyse and view the trace information.  This man page describes blktrace, which records  the  i/o  event  trace
       information for a specific block device to a file.

       The  blktrace  utility  extracts  event traces from the kernel (via the relaying through the debug file system). Some background details concerning the run-time behaviour of blktrace will help to
       understand some of the more arcane command line options:

       - blktrace receives data from the kernel in buffers passed up through the debug file system (relay). Each device being traced has a file created in the mounted directory for  the  debugfs,  which
         defaults to /sys/kernel/debug -- this can be overridden with the -r command line argument.

       - blktrace defaults to collecting all events that can be traced. To limit the events being captured, you can specify one or more filter masks via the -a option.

         Alternatively, one may specify the entire mask utilising a hexadecimal value that is version-specific. (Requires understanding of the internal representation of the filter mask.)

       - As noted above, the events are passed up via a series of buffers stored into debugfs files. The size and number of buffers can be specified via the -b and -n arguments respectively.

       - blktrace  stores  the extracted data into files stored in the local directory. The format of the file names is (by default) device.blktrace.cpu, where device is the base device name (e.g, if we
         are tracing /dev/sda, the base device name would be sda); and cpu identifies a CPU for the event stream.

         The device portion of the event file name can be changed via the -o option.

       - blktrace may also be run concurrently with blkparse to produce live output -- to do this specify -o - for blktrace.

       - The default behaviour for blktrace is to run forever until explicitly killed by the user (via a control-C, or sending SIGINT signal to the process via invocation the kill (1) utility). Also you
         can specify a run-time duration for blktrace via the -w option -- then blktrace will run for the specified number of seconds, and then halt.

blktrace的使用

debugfs

blktrace需要debugfs的支持,如果系统提示debugfs没有mount,需先mount。
执行 mount -t debugfs none /sys/kernel/debug 再执行blktrace命令。

blktrace、blkparse、btt

这三个命令都属于blktrace这个软件包,是一家人。

blktrace实时查看硬盘的信息

[root@localhost blktrace]# blktrace  -d /dev/nvme0n1 -o- |blkparse -i -
259,0   58        1     0.000000000 38746  A WSM 1819522465 + 3 <- (253,0) 52456865
259,0   58        2     0.000001051 38746  A WSM 1822850465 + 3 <- (259,3) 1819522465

259,0   143       10     0.000419495     0  C  WM 1850159232 + 32 [0]
^CCPU58 (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:           1,        1KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        1,        1KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            10
 IO unplugs:             1               Timer unplugs:           0
CPU141 (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:           9,       92KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        9,       92KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            10
 IO unplugs:             1               Timer unplugs:           0
CPU143 (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        0,        0KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       10,       94KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            10
 IO unplugs:             0               Timer unplugs:           0

Total (259,0):
 Reads Queued:           0,        0KiB  Writes Queued:          10,       94KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       10,       94KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       10,       94KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             2               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (259,0): 64 entries
Skips: 0 forward (0 -   0.0%)

blktrace收集硬盘的信息

可以先用如下命令采集信息,待所有信息采集完毕后,统一分析所有采集到的数据。搜集信息的命令如下:

blktrace -d /dev/sdb -w 300

注意,这个命令并不是只输出一个文件,他会根据CPU的个数上,每一个CPU都会输出一个文件,如下所示:

[root@localhost home]# blktrace -d /dev/nvme0n1 -w 10
=== nvme0n1 ===
  CPU  0:                    0 events,        0 KiB data
  CPU  1:                    0 events,        0 KiB data
  CPU  2:                    0 events,        0 KiB data

  CPU142:                    0 events,        0 KiB data
  CPU143:                    5 events,        1 KiB data
  Total:                    43 events (dropped 0),        3 KiB data
[root@localhost home]# ls |grep -i nvme |wc -l
144
[root@localhost home]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              144

[root@localhost home]# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              144

blkparse 分析采集的数据

[root@localhost home]# blkparse -i nvme0n1
Input file nvme0n1.blktrace.22 added
Input file nvme0n1.blktrace.138 added
Input file nvme0n1.blktrace.139 added
Input file nvme0n1.blktrace.143 added
259,0   138        1     0.000000000 41470  A WSM 887728752 + 23 <- (253,2) 879338096
259,0   138        2     0.000000428 41470  A WSM 891056752 + 23 <- (259,3) 887728752
259,0   138        3     0.000001042 41470  Q WSM 891056752 + 23 [kworker/138:0]


Total (nvme0n1):
 Reads Queued:           0,        0KiB  Writes Queued:           6,       79KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        3,       79KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        3,       79KiB
 Read Merges:            0,        0KiB  Write Merges:            3,       48KiB
 IO unplugs:             2               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (nvme0n1): 34 entries
Skips: 0 forward (0 -   0.0%)

但blkparse 分析的内容仍然太多,太杂,不利于人的理解,这个时候可以使用btt工具了就。

btt工具的使用

首先使用gblkparse将不同cpu的多个文件聚合成一个文件。

[root@localhost home]# blkparse -i nvme0n1 -d nvme0n1.blktrace.bin
Input file nvme0n1.blktrace.22 added
Input file nvme0n1.blktrace.138 added
Input file nvme0n1.blktrace.139 added
Input file nvme0n1.blktrace.143 added
259,0   138        1     0.000000000 41470  A WSM 887728752 + 23 <- (253,2) 879338096
259,0   138        2     0.000000428 41470  A WSM 891056752 + 23 <- (259,3) 887728752


Total (nvme0n1):
 Reads Queued:           0,        0KiB  Writes Queued:           6,       79KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        3,       79KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        3,       79KiB
 Read Merges:            0,        0KiB  Write Merges:            3,       48KiB
 IO unplugs:             2               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (nvme0n1): 34 entries
Skips: 0 forward (0 -   0.0%)

[root@localhost home]# ll |grep *.bin
-rw-r--r--. 1 root root 2152 Jul  3 18:54 nvme0n1.blktrace.bin

然后btt就可以分析这个sdb.blktrace.bin了:

[root@localhost home]# btt -i nvme0n1.blktrace.bin 
==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000004206   0.000073153   0.000339809           5
Q2G               0.000001058   0.000004088   0.000006848           3
Q2M               0.000000636   0.000001158   0.000002053           3
M2D               0.000009427   0.000014686   0.000019899           3
D2C               0.000080536   0.000092041   0.000108863           6
Q2C               0.000096052   0.000110561   0.000123132           6

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (259,  0) |   1.8488%   0.0000%   0.5237%   0.0000%  83.2490%
---------- | --------- --------- --------- --------- ---------
   Overall |   1.8488%   0.0000%   0.5237%   0.0000%  83.2490%

==================== Device Merge Information ====================

       DEV |       #Q       #D   Ratio |   BLKmin   BLKavg   BLKmax    Total
---------- | -------- -------- ------- | -------- -------- -------- --------
 (259,  0) |        6        3     2.0 |       23       53       72      159

==================== Device Q2Q Seek Information ====================

       DEV |          NSEEKS            MEAN          MEDIAN | MODE           
---------- | --------------- --------------- --------------- | ---------------
 (259,  0) |               6     295065809.3               0 | 0(3)
---------- | --------------- --------------- --------------- | ---------------
   Overall |          NSEEKS            MEAN          MEDIAN | MODE           
   Average |               6     295065809.3               0 | 0(3)

==================== Device D2D Seek Information ====================

       DEV |          NSEEKS            MEAN          MEDIAN | MODE           
---------- | --------------- --------------- --------------- | ---------------
 (259,  0) |               3     590131618.7       879337976 | 128(1)
                                                               ...(2 more)
---------- | --------------- --------------- --------------- | ---------------
   Overall |          NSEEKS            MEAN          MEDIAN | MODE           
   Average |               3     590131618.7       879337976 | 128(1)
                                                               ...(2 more)


==================== Plug Information ====================

#          xfsaild : q activity
  0.000340851   3.0
  0.000340851   3.4
  0.000366808   3.4
  0.000366808   3.0

#          xfsaild : c activity

blktrace结果的分析

blkparse显示的各指标点示意

Q------->G------------>I--------->M------------------->D--------------------------------->C
|-Q time-|-Insert time-|
|--------- merge time ------------|-merge with other IO|
|----------------scheduler time time-------------------|---driver,adapter,storagetime---|

|----------------------- await time in iostat output -----------------------------------------|

各个指标含义

Q2Q — time between requests sent to the block layer
Q2G — time from a block I/O is queued to the time it gets a request allocated for it
G2I — time from a request is allocated to the time it is Inserted into the device's queue
Q2M — time from a block I/O is queued to the time it gets merged with an existing request
I2D — time from a request is inserted into the device's queue to the time it is actually issued to the device
M2D — time from a block I/O is merged with an exiting request until the request is issued to the device
D2C — service time of the request by the device
Q2C — total time spent in the block layer for a request

Q2I – time it takes to process an I/O prior to it being inserted or merged onto a request queue – Includes split, and remap time
I2D – time the I/O is “idle” on the request queue
D2C – time the I/O is “active” in the driver and on the device
Q2I + I2D + D2C = Q2C
Q2C: Total processing time of the I/O

案例

后续补充。

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 在Linux系统上,如果I/O发生性能问题,有没有办法进一步定位故障位置呢?iostat等最常用的工具肯定是指望不...
    pandazhong阅读 1,672评论 0 1
  • 出处: https://lwn.net/Articles/738449/The Linux block layer...
    项男阅读 820评论 0 0
  • NAME dnsmasq - A lightweight DHCP and caching DNS server....
    ximitc阅读 2,929评论 0 0
  • 文章主要列出了常见的排查工具,也有部分线上问题案例; JVM 问题排查 JDK工具包 javap 反编译工具jav...
    霸体阅读 719评论 0 0
  • http://lwn.net/Articles/438256/ Since the dawn of time, o...
    项男阅读 345评论 0 0