概要
本文对dd测试中的进行下简单介绍。
dd的man手册
收下看下dd测试中,flag参数都有哪些,查看dd的man手册如下:
'if=FILE'
Read from FILE instead of standard input.
'of=FILE'
Write to FILE instead of standard output. Unless 'conv=notrunc' is
given, 'dd' truncates FILE to zero bytes (or the size specified
with 'seek=').
'bs=BYTES'
Set both input and output block sizes to BYTES. This makes 'dd'
read and write BYTES per block, overriding any 'ibs' and 'obs'
settings. In addition, if no data-transforming 'conv' option is
specified, input is copied to the output as soon as it's read, even
if it is smaller than the block size.
'count=N'
Copy N 'ibs'-byte blocks from the input file, instead of everything
until the end of the file. if 'iflag=count_bytes' is specified, N
is interpreted as a byte count rather than a block count. Note if
the input may return short reads as could be the case when reading
from a pipe for example, 'iflag=fullblock' will ensure that
'count=' corresponds to complete input blocks rather than the
traditional POSIX specified behavior of counting input read
operations.
'conv=CONVERSION[,CONVERSION]...'
Convert the file as specified by the CONVERSION argument(s). (No
spaces around any comma(s).)
Conversions:
'ascii'
Convert EBCDIC to ASCII, using the conversion table specified
by POSIX. This provides a 1:1 translation for all 256 bytes.
'ebcdic'
Convert ASCII to EBCDIC. This is the inverse of the 'ascii'
conversion.
'ibm'
Convert ASCII to alternate EBCDIC, using the alternate
conversion table specified by POSIX. This is not a 1:1
translation, but reflects common historical practice for '~',
'[', and ']'.
The 'ascii', 'ebcdic', and 'ibm' conversions are mutually
exclusive.
'block'
For each line in the input, output 'cbs' bytes, replacing the
input newline with a space and padding with spaces as
necessary.
'unblock'
Remove any trailing spaces in each 'cbs'-sized input block,
and append a newline.
The 'block' and 'unblock' conversions are mutually exclusive.
'lcase'
Change uppercase letters to lowercase.
'ucase'
Change lowercase letters to uppercase.
The 'lcase' and 'ucase' conversions are mutually exclusive.
'sparse'
Try to seek rather than write NUL output blocks. On a file
system that supports sparse files, this will create sparse
output when extending the output file. Be careful when using
this option in conjunction with 'conv=notrunc' or
'oflag=append'. With 'conv=notrunc', existing data in the
output file corresponding to NUL blocks from the input, will
be untouched. With 'oflag=append' the seeks performed will be
ineffective. Similarly, when the output is a device rather
than a file, NUL input blocks are not copied, and therefore
this option is most useful with virtual or pre zeroed devices.
'swab'
Swap every pair of input bytes. GNU 'dd', unlike others,
works when an odd number of bytes are read--the last byte is
simply copied (since there is nothing to swap it with).
'swab'
Swap every pair of input bytes. GNU 'dd', unlike others,
works when an odd number of bytes are read--the last byte is
simply copied (since there is nothing to swap it with).
'sync'
Pad every input block to size of 'ibs' with trailing zero
bytes. When used with 'block' or 'unblock', pad with spaces
instead of zero bytes.
The following "conversions" are really file flags and don't affect
internal processing:
'excl'
Fail if the output file already exists; 'dd' must create the
output file itself.
'nocreat'
Do not create the output file; the output file must already
exist.
The 'excl' and 'nocreat' conversions are mutually exclusive.
'notrunc'
Do not truncate the output file.
'noerror'
Continue after read errors.
'fdatasync'
Synchronize output data just before finishing. This forces a
physical write of output data.
'fsync'
Synchronize output data and metadata just before finishing.
This forces a physical write of output data and metadata.
'iflag=FLAG[,FLAG]...'
Access the input file using the flags specified by the FLAG
argument(s). (No spaces around any comma(s).)
'oflag=FLAG[,FLAG]...'
Access the output file using the flags specified by the FLAG
argument(s). (No spaces around any comma(s).)
Here are the flags. Not every flag is supported on every operating
system.
'append'
Write in append mode, so that even if some other process is
writing to this file, every 'dd' write will append to the
current contents of the file. This flag makes sense only for
output. If you combine this flag with the 'of=FILE' operand,
you should also specify 'conv=notrunc' unless you want the
output file to be truncated before being appended to.
'cio'
Use concurrent I/O mode for data. This mode performs direct
I/O and drops the POSIX requirement to serialize all I/O to
the same file. A file cannot be opened in CIO mode and with a
standard open at the same time.
'direct'
Use direct I/O for data, avoiding the buffer cache. Note that
the kernel may impose restrictions on read or write buffer
sizes. For example, with an ext4 destination file system and
a linux-based kernel, using 'oflag=direct' will cause writes
to fail with 'EINVAL' if the output buffer size is not a
multiple of 512.
'directory'
Fail unless the file is a directory. Most operating systems
do not allow I/O to a directory, so this flag has limited
utility.
'dsync'
Use synchronized I/O for data. For the output file, this
forces a physical write of output data on each write. For the
input file, this flag can matter when reading from a remote
file that has been written to synchronously by some other
process. Metadata (e.g., last-access and last-modified time)
is not necessarily synchronized.
'sync'
Use synchronized I/O for both data and metadata.
'nocache'
Discard the data cache for a file. When count=0 all cache is
discarded, otherwise the cache is dropped for the processed
portion of the file. Also when count=0 failure to discard the
cache is diagnosed and reflected in the exit status. Here as
some usage examples:
# Advise to drop cache for whole file
dd if=ifile iflag=nocache count=0
# Ensure drop cache for the whole file
dd of=ofile oflag=nocache conv=notrunc,fdatasync count=0
# Drop cache for part of file
dd if=ifile iflag=nocache skip=10 count=10 of=/dev/null
# Stream data using just the read-ahead cache
dd if=ifile of=ofile iflag=nocache oflag=nocache
'nonblock'
Use non-blocking I/O.
'noatime'
Do not update the file's access time. Some older file systems
silently ignore this flag, so it is a good idea to test it on
your files before relying on it.
'noctty'
Do not assign the file to be a controlling terminal for 'dd'.
This has no effect when the file is not a terminal. On many
hosts (e.g., GNU/Linux hosts), this option has no effect at
all.
'nofollow'
Do not follow symbolic links.
'nolinks'
Fail if the file has multiple hard links.
'binary'
Use binary I/O. This option has an effect only on nonstandard
platforms that distinguish binary from text I/O.
'text'
Use text I/O. Like 'binary', this option has no effect on
standard platforms.
'fullblock'
Accumulate full blocks from input. The 'read' system call may
return early if a full block is not available. When that
happens, continue calling 'read' to fill the remainder of the
block. This flag can be used only with 'iflag'. This flag is
useful with pipes for example as they may return short reads.
In that case, this flag is needed to ensure that a 'count='
argument is interpreted as a block count rather than a count
of read operations.
'count_bytes'
Interpret the 'count=' operand as a byte count, rather than a
block count, which allows specifying a length that is not a
multiple of the I/O block size. This flag can be used only
with 'iflag'.
我们重点对direct、dsync、sync来进行下介绍,在介绍之前,需要首先了解下linux的I/O体系,如下:
Linux I/O体系
上面的图片有些复杂,可以简略为如下图片:
Linux磁盘I/O可以分为以下层次:
虚拟文件系统层
文件系统层
缓存层
通用块层
I/O调度层
驱动层
物理设备层
虚拟文件系统层
一般来说,应用程序不会直接跟物理设备直接打交道,基本上都是经过文件系统去操作设备。文件系统种类比较多,比如基于块设备的ext系列、xfs,网络文件系统nfs等等,各类文件系统的接口和实现各不相同,这就产生了一个问题,难道应用程序要为各种文件系统做特殊化处理吗?答案是不用的,因为有虚拟文件系统。虚拟文件系统层位于文件系统层之上,屏蔽了各种文件系统的差异,为应用层提供了一个统一的、虚拟的文件系统接口,也就是说应用程序使用一套统一的接口便可以操作所有的文件系统。
文件系统层
基于虚拟文件系统定义的统一接口,实现具体文件系统的功能,文件系统有三类:
1.基于块设备的文件系统,如ext2、3、4,xfs;
2.网络文件系统,如nfs、cifs;
3.特殊文件系统,如/proc、裸设备文件。
缓存层
相比于CPU和内存,磁盘I/O属于慢速I/O,为了提高磁盘I/O的速度,Linux添加了缓存层。默认情况下,I/O数据先放到缓存中便返回上层,由内核再把数据写到设备,或者是上层把缓存数据读走。对于写操作,由于数据是放到缓存便返回了,上层认为I/O结束了,实际上数据还没落盘,如果这时候电脑异常掉电了,数据将会丢失。如果应用层要确保数据写到物理设备了,可以调用flush接口,缓存中的数据将会刷到物理设备中。Linux也提供了绕过缓存层的设置,打开文件的时候指定direct标识,数据将绕过缓存层继续执行。
可通过free看到目前缓存的数据量,下图的buff/cache便是:
通用块层
由于设备种类繁多,接口也各不相同,为了屏蔽这些设备的差异,添加了通用块层。文件系统只需要跟统一的通用层打交道便可以跟设备通信,无需关心实际设备驱动的实现,简化了文件系统的实现。
I/O调度层
磁盘I/O请求是随机的,请求操作的磁盘位置也是随机的,为了减少磁盘I/O的磁盘,增大磁盘整体的吞吐量,Linux添加了I/O调度层。I/O调度层使用调度算法,更加合理的对I/O请求进行排序和合并,经典的是电梯算法。
把磁盘I/O请求比作为乘坐电梯,分别有请求到3楼、到2楼、到6楼、到4楼,如果没有调度算法的处理,将会出现电梯从1楼到3楼,从3楼到2楼,从2楼到6楼,再从6楼到4楼,造成电梯资源的浪费;如果有了调度算法,对调度进行了合理的排序,将出现电梯先到2楼、3楼、4楼、6楼,一次从1楼到6楼便可以完成所有的请求。
驱动层
各类物理设备的驱动层,用于内核与物理设备通讯。内核会提供驱动的通用接口,设备商根据接口实现驱动程序并注册到内核便可实现内核与设备的通讯。
物理设备层
各种物理磁盘设备,提供实际的存储功能,慢速设备有传统的机械硬盘HDD、快速的有固态硬盘SSD和NVME。物理磁盘也会带有缓存,用于提供I/O速度,磁盘中带有电容,可保证哪怕掉电也能把缓存数据刷写到磁盘中。
常见参数对此
conv标志
'fdatasync'
Synchronize output data just before finishing. This forces a
physical write of output data.
'fsync'
Synchronize output data and metadata just before finishing.
This forces a physical write of output data and metadata.
oflag参数
'direct'
Use direct I/O for data, avoiding the buffer cache. Note that
the kernel may impose restrictions on read or write buffer
sizes. For example, with an ext4 destination file system and
a linux-based kernel, using 'oflag=direct' will cause writes
to fail with 'EINVAL' if the output buffer size is not a
multiple of 512.
'dsync'
Use synchronized I/O for data. For the output file, this
forces a physical write of output data on each write. For the
input file, this flag can matter when reading from a remote
file that has been written to synchronously by some other
process. Metadata (e.g., last-access and last-modified time)
is not necessarily synchronized.
'sync'
Use synchronized I/O for both data and metadata.
没oflag
没有oflag时,dd按照默认的方式打开输出文件,默认是buffered I/O,数据写到缓存层便返回,所以速度最快。
oflag=direct
以该方式打开输出文件,数据写到磁盘缓存便返回,所以速度比上面的buffered I/O方式要慢。
oflag=sync
以该方式打开输出文件,数据全部落盘才返回,所以速度比上面的仅写到磁盘缓存要慢。
oflag=dsync
以该方式打开输出文件,跟sync相同,区别在于sync同步元数据,但是dsync不包括元数据。
实际案例
某客户两台同配置机器,运行数据库业务,对两台机器使用dd性能测试,客户原始反馈两台机器使用dd测试性能差距较大,如下:
主服务器,有业务运行
同时客户表示主服务器tpm文件系统写入较快
备服务器,无业务运行。
分析
1、tmp写入较快为bs=1M,同时为fsync(参数原则上要求物理写入,但测试对象为/tmp文件系统,此文件系统会有些特殊)。
举例如下,同样的参数在/tmp下跟在/home下执行就会有些差别:
2、两台机器差别较大的原因,怀疑主要是受业务的影响。
3、因此建议在无业务影响的条件下,测试其他非/tmp文件系统,并使用oflag=direct(排除系统缓存影响),结果如下: