HBase Major Compaction大合并

前言：大合并，就是将一个Region下的所有StoreFile合并成一个StoreFile文件，在大合并的过程中，之前删除的行和过期的版本都会被删除。大合并一般一周做一次，由于执行期间会对整个集群的磁盘和带宽带来较大影响，一般建议设置hbase.hregion.majorcompaction设为0来禁用该功能，并在夜间集群负载较低时通过定时任务脚本来执行。

#!/bin/bash

#author:Wang Kuan

#date:2019-05-16

#major_compaction is use short I/O and bandwidth consumption for low latency of subsequent queries

metrics_status="metrics_status.txt"

metrics_file_count="metrics_file_count.txt"

metrics_filecount_gt="metrics_fiflecount_gc.txt"

metrics_filecount_sorted="metrics_filecount_sorted.txt"

tables_need_compact="tables_need_conpact.txt"

rm -rf $metrics_status $metrics_file_count $metrics_filecount_gt $metrics_filecount_sorted $tables_need_compact

compact_num=10

storefile_num=40

echo "status 'detailed'" | hbase shell > $metrics_status #查看hbase集群详细信息

sed -i '1, 12d' $metrics_status #筛选掉1-12行登入信息

sed -i '$d' $metrics_status #筛选掉最后一行，不同版本最后几行输出内容不同

sed -i '$d' $metrics_status

awk -F '[",=]+' '{ #一个或多个" , =

if(NR%2==1)

print $2 > "'$metrics_file_count'" #奇数行输出region名

else

print $4 > "'$metrics_file_count'" #偶数行输出storefiles数

}' $metrics_status

sed -i 'N; s/\n/ /' "$metrics_file_count" #换行符替换为空格

awk '{if ($2 >= '$storefile_num') print $0 > "'$metrics_filecount_gt'"}' $metrics_file_count #筛选出大于40个storefile的region

for i in `cat $metrics_filecount_gt | awk '{print $1}'|sort -n|uniq`

cat $metrics_filecount_gt|grep $i|sort | tail -1 >> $tables_need_compact

done

sort -r -n -k 2 $tables_need_compact -o $tables_need_compact #按storefile数量排序后重新写入

for i in `head -$compact_num $tables_need_compact|awk '{print $1}'`

echo "major_compact '$i'"|hbase shell

done

HBase Major Compaction大合并

推荐阅读更多精彩内容