HBase集群平滑迁移步骤

HBase集群平滑迁移步骤

测试环境

CDH版本 HBASE版本
测试源集群 5.15.1 1.2.0
测试目标集群 6.2.0 2.1.0

线上迁移环境

CDH版本 HBASE版本
线上源集群 5.9.3 1.2.0
线上目标集群 6.2.0 2.1.0

迁移前准备

源集群配置

snapshot配置(已配置忽略)

#修改配置
hbase.snapshot.enabled=true

replication配置(已配置忽略)

#设置
hbase.replication=true

源集群表信息统计:

查看namespace:

#查看是否有用户namespace
hbase(main):001:0> list_namespace
NAMESPACE                                                                                                                                                                                   
default                                                                                                                                                                                     
hbase                                                                                                                                                                                       
2 row(s) in 0.3830 seconds

查看各namespace下表

#查看namespace:hbase下是否有需要迁移的表
hbase(main):002:0> list_namespace_tables 'hbase'
TABLE                                                                                                                                                                                       
meta                                                                                                                                                                                        
namespace                                                                                                                                                                                   
2 row(s) in 0.0260 seconds
#namespace:hbase表不需要迁移

确定待迁移表

#确定待迁移表为:namespace:default下表,总共183张表
hbase(main):003:0> list_namespace_tables 'default'
TABLE                                                                                                                                                                                       
air_message_record                                                                                                                                                                          
ali_upload_records                                                                                                                                                                          
ali_upload_records_sec                                                                                                                                                                      
alibaba_records                                                                                                                                                                             
alipay_records                                                                                                                                                                              
alitaobao_records                                                                                                                                                                           
attachments                                                                                                                                                                                 
audit_logs                                                                                                                                                                                  
audit_message                                                                                                                                                                               
.................................                                                                                                                                                               
zhima_feedbacks                                                                                                                                                                             
zhima_history_feedbacks                                                                                                                                                                     
183 row(s) in 0.2350 seconds

在目标集群创建与源集群相同的表

方式一:

参考地址:
https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_bdr_hbase_replication.html#concept_xjp_2tt_nw
/**If the table to be replicated does not yet exist on the destination cluster, you must create it. The easiest way to do this is to extract the schema using HBase Shell.
/**On the source cluster, describe the table using HBase Shell. The output below has been reformatted for readability.
hbase> describe acme_users
 
Table acme_users is ENABLED
acme_users
COLUMN FAMILIES DESCRIPTION
{NAME => 'user', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}
/**Copy the output and make the following changes:
/**For the TTL, change FOREVER to org.apache.hadoop.hbase.HConstants::FOREVER.
/**Add the word CREATE before the table name.
/**Remove the line COLUMN FAMILIES DESCRIPTION and everything above the table name.
/**The result is a command like the following:
"CREATE 'cme_users' ,
 
{NAME => 'user', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => org.apache.hadoop.hbase.HConstants::FOREVER, KEEP_DELETED_CELLS => 'FALSE',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}
/**On the destination cluster, paste the command from the previous step into HBase Shell to create the table.

方式二:Hbase-ui界面,复制表信息,然后创建:

image.png

线上所有Table创建快捷脚本:

省略。。。

在源集群添加目标集群作为peer(增量配置)

#add_peer 'ID', 'CLUSTER_KEY',其中后者参数值为:
hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent
#如下
add_peer '1', CLUSTER_KEY => "10.10.15.56,10.10.15.18,10.10.15.79,10.10.15.84,10.10.15.88:2181:/hbase"
#NAMESPACES => ["default"], SERIAL => true**新版本才支持****

为什么新版本中需要串行Replication

参考:https://hbase.apache.org/book.html#_serial_replication

  • 先put然后delete到源集群
  • 由于region移动/ RS故障,它们被不同的复制源线程推送到对等群集。
  • 如果在put之前将delete删除操作推入对等群集,并且在将put放置到对等群集之前在同等群集中发生flush和major-compact,则将收集删除并将该put放在对等群集中,但是在源群集中,该put操作将被delete,因此源群集和目标群集之间的数据不一致。

逐个表依次迁移步骤(增量)

开启表级replication,开始双写

#列族cf1..
hbase> alter 'tableName', {NAME => 'cf1', REPLICATION_SCOPE => '1'},{NAME => 'cf2', REPLICATION_SCOPE => '1'}...

验证replication命令:(版本不一致,无法检查。略过)

#命令:
hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication [--starttime=timestamp1] [--stoptime=timestamp] [--families=comma separated list of families]  peerId tableName
 
 
#命令参考:
bash-4.2$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication
Usage: verifyrep [--starttime=X] [--endtime=Y] [--families=A] [--row-prefixes=B] [--delimiter=] [--recomparesleep=] [--batch=] [--verbose] [--sourceSnapshotName=P] [--sourceSnapshotTmpDir=Q] [--peerSnapshotName=R] [--peerSnapshotTmpDir=S] [--peerFSAddress=T] [--peerHBaseRootAddress=U]  
 
Options:
 starttime    beginning of the time range
              without endtime means from starttime to forever
 endtime      end of the time range
 versions     number of cell versions to verify
 batch        batch count for scan, note that result row counts will no longer be actual number of rows when you use this option
 raw          includes raw scan if given in options
 families     comma-separated list of families to copy
 row-prefixes comma-separated list of row key prefixes to filter on
 delimiter    the delimiter used in display around rowkey
 recomparesleep   milliseconds to sleep before recompare row, default value is 0 which disables the recompare.
 verbose      logs row keys of good rows
 sourceSnapshotName  Source Snapshot Name
 sourceSnapshotTmpDir Tmp location to restore source table snapshot
 peerSnapshotName  Peer Snapshot Name
 peerSnapshotTmpDir Tmp location to restore peer table snapshot
 peerFSAddress      Peer cluster Hadoop FS address
 peerHBaseRootAddress  Peer cluster HBase root location
 
Args:
 peerid       Id of the peer used for verification, must match the one given for replication
 tablename    Name of the table to verify
 
Examples:
 To verify the data replicated from TestTable for a 1 hour window with peer #5
 $ hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --starttime=1265875194289 --endtime=1265878794289 5 TestTable

验证结果参考:

image.png

制作snapshot(全量数据迁移)

snapshot 'sourceTable', 'snapshotName'

导出snapshot到目标集群

#命令:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot xxx.snapshot -copy-to hdfs://xxx:8020/hbase -mappers XX -overwrite -bandwidth 5
#使用hdfs用户
eg:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot xinyan_black_records_sec.snapshot07021143 -copy-to hdfs://10.10.15.56:8020/hbase -chuser hbase -chgroup hbase -chmod 777 -overwrite -bandwidth 5
#命令参考;
Usage: bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot [options]
 where [options] are:
  -h|-help                Show this help and exit.
  -snapshot NAME          Snapshot to restore.
  -copy-to NAME           Remote destination hdfs://
  -copy-from NAME         Input folder hdfs:// (default hbase.rootdir)
  -no-checksum-verify     Do not verify checksum, use name+length only.
  -no-target-verify       Do not verify the integrity of the \exported snapshot.
  -overwrite              Rewrite the snapshot manifest if already exists
  -chuser USERNAME        Change the owner of the files to the specified one.
  -chgroup GROUP          Change the group of the files to the specified one.
  -chmod MODE             Change the permission of the files to the specified one.
  -mappers                Number of mappers to use during the copy (mapreduce.job.maps).
  -bandwidth              Limit bandwidth to this value in MB/second.
 
Examples:
  hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
    -snapshot MySnapshot -copy-to hdfs://srv2:8082/hbase \
    -chuser MyUser -chgroup MyGroup -chmod 700 -mappers 16
 
  hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
    -snapshot MySnapshot -copy-from hdfs://srv2:8082/hbase \
    -copy-to hdfs://srv1:50070/hbase \

** bandwidth单位为MB/s,大B,限制的是单个Region Server的流量,若有n个Region Server,总流量为nbandwidth*

目标集群bulkload导入snapshot

bulkload位置

Snapshot metadata is stored in the .hbase_snapshot directory under the hbase root directory (/hbase/.hbase-snapshot). Each snapshot has its own directory that includes all the references to the hfiles, logs, and metadata needed to restore the table.hfiles required by the snapshot are in the
/hbase/data/<namespace>/<tableName>/<regionName>/<familyName>/
location if the table is still using them; otherwise, they are in
/hbase/.archive/<namespace>/<tableName>/<regionName>/<familyName>/.

修改bulkload配置参数;(新集群已经修改)

image.png

导入命令(CDH6支持全表导入):(全量)

#默认最大的HFile个数最大32,需要修改参数;hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily,如上:
./bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles \
-Dcreate.table=no \
/hbase/archive/data/default/${TableName}/ \
${TableName} -loadTable
 
 
#命令实例参考:
hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles -Dcreate.table=no /hbase/archive/data/default/TableName/ ${TRableName} -loadTable

目标集群验证表迁移操作

检查HBase的region一致性与table完整性

#命令
./bin/hbase hbck ${tableName} -checkCorruptHFiles -sidelineCorruptHFiles -boundaries -summary  -exclusive
#命令参考;
-----------------------------------------------------------------------
NOTE: As of HBase version 2.0, the hbck tool is significantly changed.
In general, all Read-Only options are supported and can be be used
safely. Most -fix/ -repair options are NOT supported. Please see usage
below for details on which options are not supported.
-----------------------------------------------------------------------
 
Usage: fsck [opts] {only tables}
 where [opts] are:
   -help Display help options (this)
   -details Display full report of all regions.
   -timelag   Process only regions that  have not experienced any metadata updates in the last   seconds.
   -sleepBeforeRerun  Sleep this many seconds before checking if the fix worked if run with -fix
   -summary Print only summary of the tables and status.
   -metaonly Only check the state of the hbase:meta table.
   -sidelineDir  HDFS path to backup existing meta.
   -boundaries Verify that regions boundaries are the same between META and store files.
   -exclusive Abort if another hbck is exclusive or fixing.
 
  Datafile Repair options: (expert features, use with caution!)
   -checkCorruptHFiles     Check all Hfiles by opening them to make sure they are valid
   -sidelineCorruptHFiles  Quarantine corrupted HFiles.  implies -checkCorruptHFiles
 
 Replication options
   -fixReplication   Deletes replication queues for removed peers
 
  Metadata Repair options supported as of version 2.0: (expert features, use with caution!)
   -fixVersionFile   Try to fix missing hbase.version file in hdfs.
   -fixReferenceFiles  Try to offline lingering reference store files
   -fixHFileLinks  Try to offline lingering HFileLinks
   -noHdfsChecking   Don't load/check region info from HDFS. Assumes hbase:meta region info is good. Won't check/fix any HDFS issue, e.g. hole, orphan, or overlap
   -ignorePreCheckPermission  ignore filesystem permission pre-check

验证表中数据一致性:

小表:

通过hbase count命令

大表:

#大表通过MR验证
./bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=true ${tableName}

迁移完成,观察线上HBase运行

遇到问题

经过测试,在一定宽带及IO下,小表EXportSnapshot没有问题,大表会出现archive下HFile找不到问题;如果小表使用更低宽带EXportSnapshot,也会出现此类问题;

解决:

已解决:

使用HBase导出命令:hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot ycsbTable02.snapshot -copy-to hdfs://10.10.15.56:8020/hbase/ycsb -chuser hbase -chgroup hbase -chmod 755 -mappers 1 -bandwidth 1 -overwrite 可以解决大表迁移是失败问题。

不要放到目标集群的/hbase/archive 目录下,切换别的目录

原因:因为master上会启动一个定期清理archive中垃圾文件的线程(HFileCleaner),定期会对这些被删除的垃圾文件进行清理(5分钟扫描一次)。如果目标集群下的snap文件没有被引用,就会被HFileCleaner 清理掉
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,544评论 6 501
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,430评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,764评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,193评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,216评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,182评论 1 299
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,063评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,917评论 0 274
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,329评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,543评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,722评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,425评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,019评论 3 326
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,671评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,825评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,729评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,614评论 2 353

推荐阅读更多精彩内容