HBase 性能测试之读写P999延时压测实践

我们在使用HBase的时候，必须要能够清楚HBase服务端的性能，这对HBase的合理使用以及性能调优都非常重要，所以一般在使用HBase之前，建议做一些必要的基准性能测试，其中，读写P99/P999延时就是一项衡量HBase性能的关键指标。本文首先介绍下HBase自带的性能测试工具——PerformanceEvaluation的使用，然后通过它压测下HBase读写路径P999延时情况。

一、HBase PE 参数介绍

PerformanceEvaluation，这里简称PE，全名为org.apache.hadoop.hbase.PerformanceEvaluation，是HBase自带的性能测试工具，目前主要支持HBase随机/顺序读写延时的性能测试。执行 bin/hbase pe 可直接使用：

[root@xxx ~]$ hbase pe
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
  <OPTIONS> [-D<property=value>]* <command> <nclients>

Options:
 nomapred        Run multiple clients using threads (rather than use mapreduce)
 rows            Rows each client runs. Default: 1048576
 size            Total size in GiB. Mutually exclusive with --rows. Default: 1.0.
 sampleRate      Execute test on a sample of total rows. Only supported by randomRead. Default: 1.0
 traceRate       Enable HTrace spans. Initiate tracing every N rows. Default: 0
 table           Alternate table name. Default: 'TestTable'
 multiGet        If >0, when doing RandomRead, perform multiple gets instead of single gets. Default: 0
 compress        Compression type to use (GZ, LZO, ...). Default: 'NONE'
 flushCommits    Used to determine if the test should flush the table. Default: false
 writeToWAL      Set writeToWAL on puts. Default: True
 autoFlush       Set autoFlush on htable. Default: False
 oneCon          all the threads share the same connection. Default: False
 presplit        Create presplit table. If a table with same name exists, it'll be deleted and recreated (instead of verifying count of its existing regions). Recommended for accurate perf analysis (see guide). Default: disabled
 inmemory        Tries to keep the HFiles of the CF inmemory as far as possible. Not guaranteed that reads are always served from memory.  Default: false
 usetags         Writes tags along with KVs. Use with HFile V3. Default: false
 numoftags       Specify the no of tags that would be needed. This works only if usetags is true. Default: 1
 filterAll       Helps to filter out all the rows on the server side there by not returning any thing back to the client.  Helps to check the server side performance.  Uses FilterAllFilter internally. 
 latency         Set to report operation latencies. Default: False
 bloomFilter      Bloom filter type, one of [NONE, ROW, ROWCOL]
 blockEncoding   Block encoding to use. Value should be one of [NONE, PREFIX, DIFF, FAST_DIFF, PREFIX_TREE]. Default: NONE
 valueSize       Pass value size to use: Default: 1000
 valueRandom     Set if we should vary value size between 0 and 'valueSize'; set on read for stats on size: Default: Not set.
 valueZipf       Set if we should vary value size between 0 and 'valueSize' in zipf form: Default: Not set.
 period          Report every 'period' rows: Default: opts.perClientRunRows / 10 = 104857
 multiGet        Batch gets together into groups of N. Only supported by randomRead. Default: disabled
 addColumns      Adds columns to scans/gets explicitly. Default: true
 replicas        Enable region replica testing. Defaults: 1.
 splitPolicy     Specify a custom RegionSplitPolicy for the table.
 randomSleep     Do a random sleep before each get between 0 and entered value. Defaults: 0
 columns         Columns to write per row. Default: 1
 caching         Scan caching to use. Default: 30

 Note: -D properties will be applied to the conf used. 
  For example: 
   -Dmapreduce.output.fileoutputformat.compress=true
   -Dmapreduce.task.timeout=60000

Command:
 append          Append on each row; clients overlap on keyspace so some concurrent operations
 checkAndDelete  CheckAndDelete on each row; clients overlap on keyspace so some concurrent operations
 checkAndMutate  CheckAndMutate on each row; clients overlap on keyspace so some concurrent operations
 checkAndPut     CheckAndPut on each row; clients overlap on keyspace so some concurrent operations
 filterScan      Run scan test using a filter to find a specific row based on it's value (make sure to use --rows=20)
 increment       Increment on each row; clients overlap on keyspace so some concurrent operations
 randomRead      Run random read test
 randomSeekScan  Run random seek and scan 100 test
 randomWrite     Run random write test
 scan            Run scan test (read every row)
 scanRange10     Run random seek scan with both start and stop row (max 10 rows)
 scanRange100    Run random seek scan with both start and stop row (max 100 rows)
 scanRange1000   Run random seek scan with both start and stop row (max 1000 rows)
 scanRange10000  Run random seek scan with both start and stop row (max 10000 rows)
 sequentialRead  Run sequential read test
 sequentialWrite Run sequential write test

Args:
 nclients        Integer. Required. Total number of clients (and HRegionServers) running. 1 <= value <= 500
Examples:
 To run a single client doing the default 1M sequentialWrites:
 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
 To run 10 clients doing increments over ten rows:
 $ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred increment 10

（可左右滑动）

不加任何参数就会输出如上usage提示，基本使用就是：

hbase pe <OPTIONS> [-D<property=value>]* <command> <nclients>

这里介绍几个常用的重要参数：

nomapred：表示采用MapReduce多线程测试还是本地多线程测试，一般采用本地多线程的方式，在命令中加上--nomapred即可；
oneCon：是否所有线程使用一个Connection连接，默认false，表示每个线程都会创建一个HBase Connection，这样不合理，建议设置为true，命令中加--oneCon=true即可；
valueSize：写入HBase的value的size，单位Byte，默认值为1000。需要根据实际的业务字段值的大小设置valueSize，比如--valueSize=100；
table：测试表的名称，如果不设置则默认为TestTable；
rows：单个线程测试的行数，默认值为1048576，实际测试时可自行制定，比如--rows=100000。注意这是单线程的行数，实际行数要乘以线程数，比如10个线程写入时就会往HBase中写100000*10=100w条记录；
size：单个线程测试的大小，单位为GB，默认值为1，这个参数与rows是互斥的，不能同时设置；
compress：设置表的压缩算法，默认None，表示不压缩，可以根据实际情况设置比如--compress=SNAPPY。这个设置也可以用来测试不同压缩算法对读写性能的影响；
presplit：表的预分区数量即region个数，一般要参考regionserver数量，设置一个合理值以避免数据热点和影响测试结果，比如--presplit=10；
autoFlush：写入操作的autoFlush属性，默认false，这里是BufferedMutator写入方式，禁用autoFlush表示会批量写入，一般建议设置为true以获得单条写的性能测试，即--autoFlush=true；
caching：Scan读操作的caching属性，默认值为30，一般可以根据实际使用设置，比如--caching=100；

其他参数通常可以默认或根据自己的场景调整，这里不多介绍。此外，command 是PE支持的读写测试类型，包括randomRead，randomWrite，SequentialRead，SequentialWrite等，具体如上。nclients 就是开启的线程数量。

二、HBase 读写压测 P999

集群环境

当前测试集群包含2个HMaster、8个RS节点，服务器配置：24核CPU；128G内存；1T*6 HDD磁盘，HBase堆大小配置为16G，版本为1.2.0-cdh5.11.0。因此这是一个HBase 1.x的P999性能压测，同样适用于HBase 2.x。

压测案例

这里分别测试了randomWrite、sequentialWrite，randomRead及sequentialRead的延时情况，给出当前环境下的P99及P999 latency指标供参考。

在各个测试case中，使用PE的本地多线程模式即--nomapred，测试表包含16个region，采用Snappy压缩，并且value大小为100Byte，我们相应的开了16个线程进行测试，写入测试时均关闭了autoFlush。PE运行完成后会分别打出每个线程的延迟状况，这里贴出了其中一个线程的测试结果，具体如下：

1、randomWrite

每个线程向rw_test_1表中随机写入100w条记录：

[root@xxx ~]$ hbase pe --nomapred --oneCon=true --table=rw_test_1 --rows=1000000 --valueSize=100 --compress=SNAPPY --presplit=16 --autoFlush=true randomWrite 16

20/02/22 15:06:07 INFO hbase.PerformanceEvaluation: Latency (us) : mean=186.42, min=0.00, max=594880.00, stdDev=6981.60, 50th=1.00, 75th=2.00, 95th=28.00, 99th=1020.00, 99.9th=3941.00, 99.99th=381319.92, 99.999th=503455.66
20/02/22 15:06:07 INFO hbase.PerformanceEvaluation: Num measures (latency) : 1000000
20/02/22 15:06:07 INFO hbase.PerformanceEvaluation: Mean      = 186.42
Min       = 0.00
Max       = 594880.00
StdDev    = 6981.60
50th      = 1.00
75th      = 2.00
95th      = 28.00
99th      = 1020.00
99.9th    = 3941.00
99.99th   = 381319.92
99.999th  = 503455.66

（可左右滑动）

测试结果：该case中，HBase随机写P999延时大概在4ms左右。

2、sequentialWrite

每个线程向rw_test_2表中顺序写入1G数据：

[root@xxx ~]$ hbase pe --nomapred --oneCon=true --table=rw_test_2 --size=1 --valueSize=100 --compress=SNAPPY --presplit=16 --autoFlush=true sequentialWrite 16

20/02/22 16:24:49 INFO hbase.PerformanceEvaluation: Latency (us) : mean=220.51, min=0.00, max=1440185.00, stdDev=10022.38, 50th=1.00, 75th=2.00, 95th=132.00, 99th=396.00, 99.9th=1152.00, 99.99th=515707.37, 99.999th=917447.01
20/02/22 16:24:49 INFO hbase.PerformanceEvaluation: Num measures (latency) : 1048576
20/02/22 16:24:49 INFO hbase.PerformanceEvaluation: Mean      = 220.51
Min       = 0.00
Max       = 1440185.00
StdDev    = 10022.38
50th      = 1.00
75th      = 2.00
95th      = 132.00
99th      = 396.00
99.9th    = 1152.00
99.99th   = 515707.37
99.999th  = 917447.01

（可左右滑动）

测试结果：该case中，HBase顺序写P999延时大概在1.2ms左右。

3、randomRead

以rw_test_2表为例，随机读取数据：

[root@xxx ~]$ hbase pe --nomapred --oneCon=true --table=rw_test_2 --size=1 --valueSize=100 randomRead 16

20/02/22 16:53:48 INFO hbase.PerformanceEvaluation: Latency (us) : mean=748.70, min=74.00, max=2161876.00, stdDev=5055.01, 50th=289.00, 75th=364.00, 95th=2665.00, 99th=4579.00, 99.9th=78024.00, 99.99th=100495.98, 99.999th=150378.50
20/02/22 16:53:48 INFO hbase.PerformanceEvaluation: Num measures (latency) : 1048576
20/02/22 16:53:48 INFO hbase.PerformanceEvaluation: Mean      = 748.70
Min       = 74.00
Max       = 2161876.00
StdDev    = 5055.01
50th      = 289.00
75th      = 364.00
95th      = 2665.00
99th      = 4579.00
99.9th    = 78024.00
99.99th   = 100495.98
99.999th  = 150378.50

（可左右滑动）

测试结果：该case中，HBase随机读P999延时大概在78ms左右，小于100ms。

4、sequentialRead

以rw_test_2表为例，顺序读取数据：

[root@xxx ~]$ hbase pe --nomapred --oneCon=true --table=rw_test_2 --size=1 --valueSize=100 sequentialRead 16

20/02/22 17:08:41 INFO hbase.PerformanceEvaluation: Latency (us) : mean=593.44, min=86.00, max=183676.00, stdDev=4299.28, 50th=302.00, 75th=398.00, 95th=633.00, 99th=932.00, 99.9th=75718.98, 99.99th=93035.20, 99.999th=135947.24
20/02/22 17:08:41 INFO hbase.PerformanceEvaluation: Num measures (latency) : 1048576
20/02/22 17:08:42 INFO hbase.PerformanceEvaluation: Mean      = 593.44
Min       = 86.00
Max       = 183676.00
StdDev    = 4299.28
50th      = 302.00
75th      = 398.00
95th      = 633.00
99th      = 932.00
99.9th    = 75718.98
99.99th   = 93035.20
99.999th  = 135947.24

（可左右滑动）

测试结果：该case中，HBase顺序读P999延时大概在75ms左右。

三、总结

本文介绍了如何使用HBase自带的PE工具进行读写延时测试，PE主要用于测试HBase的读写延时指标比如P999延时，但暂时不支持HBase吞吐量指标测试比如单机TPS（后面会介绍YCSB基准测试）。希望通过本文的一些量化指标，能够让我们对HBase读写速度有一个大概认识。

参考：
1、HBase2.0中的Benchmark工具-PerformanceEvaluation
2、一场HBase2.x的写入性能优化之旅

如果您喜欢这篇文章，点【在看】与转发都是一种鼓励，期待得到您的认可 ❥(^_-)

最后编辑于：2021.04.08 22:38:14

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 220,884评论 6赞 513
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 94,212评论 3赞 395
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 167,351评论 0赞 360
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 59,412评论 1赞 294
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 68,438评论 6赞 397
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 52,127评论 1赞 308
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,714评论 3赞 420
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,636评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 46,173评论 1赞 319
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,264评论 3赞 339
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,402评论 1赞 352
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 36,073评论 5赞 347
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,763评论 3赞 332
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,253评论 0赞 23
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,382评论 1赞 271
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,749评论 3赞 375
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,403评论 2赞 358

HBase 性能测试之读写P999延时压测实践

一、HBase PE 参数介绍

二、HBase 读写压测 P999

三、总结

推荐阅读更多精彩内容