作为一个大数据从业者来说, hadoop, hbase, hive, spark, storm这是都是应用得比较成熟的系统了,虽然维护性差,版本兼容性弱,但是依然坚挺着。。。
hive仅仅只是一个客户端,主流spark on yarn模式中spark也是个客户端,对于storm来说, 被 twitter捐给apache后,主人离职移交权利,活生生成了弃子一颗。故本人亦从storm流式平台转向了KSQL流式平台...
所以我们常用的大数据集群系统只有hadoop + hbase
hadoop分为两块组块, 分布式文件系统hdfs以及资源管理器yarn。
这里我们介绍的是hbase数据库,建立在hadoop hdfs模块之上,基于LSM存储引擎读写高性能的列式NOSQL数据库。目前已成为大数据服务的重要一环。
这里选择的版本是HADOOP 3.0,为什么呢?
因为hadoop 2.0的兼容性太弱,当前hbase跟spark都在积极拥抱hadoop 3.0.
其它也没有太多要介绍的,那么就开始正式的主题吧。
搭建hbase-2.1 on hadoop-3.1.1的集群
由于精力有限,hadoop及hbase均没有加入HA功能,后续将会补充进来。
分为以下四个过程:
- 创建虚拟机-createvm.sh
- 打包依赖-package.sh
- 部署集群-deploy.sh
- 启动集群-start.sh
- 测试集群
项目路径: https://github.com/clojurians-org/my-env
代码路径:run.sh.d/hbase-example/{createvm.sh, package.sh, deploy.sh, start.sh}
第0步: 创建虚拟机-createvm.sh
[larluo@larluo-nixos:~/my-env]$ cat run.sh.d/hbase-example/createvm.sh
set -e
my=$(cd -P -- "$(dirname -- "${BASH_SOURCE-$0}")" > /dev/null && pwd -P) && cd $my/../..
echo -e "\n==== bash nix.sh create-vm nixos-hbase-001" && bash nix.sh create-vm nixos-hbase-001
echo -e "\n==== bash nix.sh create-vm nixos-hbase-002" && bash nix.sh create-vm nixos-hbase-002
echo -e "\n==== bash nix.sh create-vm nixos-hbase-003" && bash nix.sh create-vm nixos-hbase-003
成功创建三台虚拟机: 192.168.56.101, 192.168.56.102, 192.168.56.103
第1步: 打包依赖-package.sh
[larluo@larluo-nixos:~/my-env]$ cat run.sh.d/hbase-example/package.sh
set -e
my=$(cd -P -- "$(dirname -- "${BASH_SOURCE-$0}")" > /dev/null && pwd -P) && cd $my/../..
echo -e "\n==== bash nix.sh build tgz.hbase-2.1.0" && bash nix.sh build tgz.hbase-2.1.0
echo -e "\n==== bash nix.sh export tgz.nix-2.0.4" && bash nix.sh export tgz.nix-2.0.4
echo -e "\n==== bash nix.sh export nix.rsync-3.1.3" && bash nix.sh export nix.rsync-3.1.3
echo -e "\n==== bash nix.sh export nix.gettext-0.19.8.1" && bash nix.sh export nix.gettext-0.19.8.1
echo -e "\n==== bash nix.sh export nix.openjdk-8u172b11" && bash nix.sh export nix.openjdk-8u172b11
echo -e "\n==== bash nix.sh export nix.zookeeper-3.4.13" && bash nix.sh export nix.zookeeper-3.4.13
echo -e "\n==== bash nix.sh export nix.hadoop-3.1.1" && bash nix.sh export nix.hadoop-3.1.1
由于hbase默认是hadoop 2的版本,故hbase on hadoop3需定制编译,打包时会自动调用。
这里由于时间关系,没有采用nix构建,直接使用mvn打包,后续可能会优化..
[larluo@larluo-nixos:~/my-env]$ cat nix.conf/hbase-2.1.0/build.sh
mvn package -Dhadoop.profile=3.0 -Dhadoop-three.version=3.1.1 -DskipTests assembly:single
这里我们除了安装nix, rsync, gettext基本工具外,添加了jdk,zookeeper,hadoop的nix软件包,最后调用hbase构建过程.
第2步. 部署集群-deploy.sh
[larluo@larluo-nixos:~/my-env]$ cat run.sh.d/hbase-example/deploy.sh
set -e
my=$(cd -P -- "$(dirname -- "${BASH_SOURCE-$0}")" > /dev/null && pwd -P) && cd $my/../..
echo -e "\n==== bash nix.sh create-user 192.168.56.101" && bash nix.sh create-user 192.168.56.101
echo -e "\n==== bash nix.sh create-user 192.168.56.102" && bash nix.sh create-user 192.168.56.102
echo -e "\n==== bash nix.sh create-user 192.168.56.103" && bash nix.sh create-user 192.168.56.103
echo -e "\n==== bash nix.sh install 192.168.56.101 tgz.nix-2.0.4" && bash nix.sh install 192.168.56.101 tgz.nix-2.0.4
echo -e "\n==== bash nix.sh install 192.168.56.102 tgz.nix-2.0.4" && bash nix.sh install 192.168.56.102 tgz.nix-2.0.4
echo -e "\n==== bash nix.sh install 192.168.56.103 tgz.nix-2.0.4" && bash nix.sh install 192.168.56.103 tgz.nix-2.0.4
echo -e "\n==== bash nix.sh install 192.168.56.101 nix.rsync-3.1.3" && bash nix.sh install 192.168.56.101 nix.rsync-3.1.3
echo -e "\n==== bash nix.sh install 192.168.56.102 nix.rsync-3.1.3" && bash nix.sh install 192.168.56.102 nix.rsync-3.1.3
echo -e "\n==== bash nix.sh install 192.168.56.103 nix.rsync-3.1.3" && bash nix.sh install 192.168.56.103 nix.rsync-3.1.3
echo -e "\n==== bash nix.sh install 192.168.56.101 nix.gettext-0.19.8.1" && bash nix.sh install 192.168.56.101 nix.gettext-0.19.8.1
echo -e "\n==== bash nix.sh install 192.168.56.102 nix.gettext-0.19.8.1" && bash nix.sh install 192.168.56.102 nix.gettext-0.19.8.1
echo -e "\n==== bash nix.sh install 192.168.56.103 nix.gettext-0.19.8.1" && bash nix.sh install 192.168.56.103 nix.gettext-0.19.8.1
echo -e "\n==== bash nix.sh install 192.168.56.101 nix.openjdk-8u172b11" && bash nix.sh install 192.168.56.101 nix.openjdk-8u172b11
echo -e "\n==== bash nix.sh install 192.168.56.102 nix.openjdk-8u172b11" && bash nix.sh install 192.168.56.102 nix.openjdk-8u172b11
echo -e "\n==== bash nix.sh install 192.168.56.103 nix.openjdk-8u172b11" && bash nix.sh install 192.168.56.103 nix.openjdk-8u172b11
echo -e "\n==== bash nix.sh install 192.168.56.101 nix.zookeeper-3.4.13" && bash nix.sh install 192.168.56.101 nix.zookeeper-3.4.13
echo -e "\n==== bash nix.sh install 192.168.56.102 nix.zookeeper-3.4.13" && bash nix.sh install 192.168.56.102 nix.zookeeper-3.4.13
echo -e "\n==== bash nix.sh install 192.168.56.103 nix.zookeeper-3.4.13" && bash nix.sh install 192.168.56.103 nix.zookeeper-3.4.13
echo -e "\n==== bash nix.sh install 192.168.56.101 nix.hadoop-3.1.1" && bash nix.sh install 192.168.56.101 nix.hadoop-3.1.1
echo -e "\n==== bash nix.sh install 192.168.56.102 nix.hadoop-3.1.1" && bash nix.sh install 192.168.56.102 nix.hadoop-3.1.1
echo -e "\n==== bash nix.sh install 192.168.56.103 nix.hadoop-3.1.1" && bash nix.sh install 192.168.56.103 nix.hadoop-3.1.1
echo -e "\n==== bash nix.sh import 192.168.56.101 tgz.hbase-2.1.0" && bash nix.sh import 192.168.56.101 tgz.hbase-2.1.0
echo -e "\n==== bash nix.sh import 192.168.56.102 tgz.hbase-2.1.0" && bash nix.sh import 192.168.56.102 tgz.hbase-2.1.0
echo -e "\n==== bash nix.sh import 192.168.56.103 tgz.hbase-2.1.0" && bash nix.sh import 192.168.56.103 tgz.hbase-2.1.0
这个过程基本上与package打包一一对应,将软件包分发至各个服务器
第3步. 启动集群-start.sh
my=$(cd -P -- "$(dirname -- "${BASH_SOURCE-$0}")" > /dev/null && pwd -P) && cd $my/../..
# start zookeeper
export ZK_ALL="192.168.56.101:2181,192.168.56.102:2181,192.168.56.103:2181"
# start zookeeper-3.4.12
echo -e "\n==== bash nix.sh start 192.168.56.101:2181 zookeeper-3.4.13 --all ${ZK_ALL}" && bash nix.sh start 192.168.56.101:2181 zookeeper-3.4.13 --all ${ZK_ALL}
echo -e "\n==== bash nix.sh start 192.168.56.102:2181 zookeeper-3.4.13 --all ${ZK_ALL}" && bash nix.sh start 192.168.56.102:2181 zookeeper-3.4.13 --all ${ZK_ALL}
echo -e "\n==== bash nix.sh start 192.168.56.103:2181 zookeeper-3.4.13 --all ${ZK_ALL}" && bash nix.sh start 192.168.56.103:2181 zookeeper-3.4.13 --all ${ZK_ALL}
# start hadoop-3.1.1
echo -e "\n==== bash nix.sh start 192.168.56.101:9000 hadoop-3.1.1:namenode" && bash nix.sh start 192.168.56.101:9000 hadoop-3.1.1:namenode
export HDFS_MASTER="192.168.56.101:9000"
echo -e "\n==== bash nix.sh start 192.168.56.101:5200 hadoop-3.1.1:datanode" && bash nix.sh start 192.168.56.101:5200 hadoop-3.1.1:datanode --master ${HDFS_MASTER}
echo -e "\n==== bash nix.sh start 192.168.56.102:5200 hadoop-3.1.1:datanode" && bash nix.sh start 192.168.56.102:5200 hadoop-3.1.1:datanode --master ${HDFS_MASTER}
echo -e "\n==== bash nix.sh start 192.168.56.103:5200 hadoop-3.1.1:datanode" && bash nix.sh start 192.168.56.103:5200 hadoop-3.1.1:datanode --master ${HDFS_MASTER}
# start hbase-2.1.0
echo -e "\n==== bash nix.sh start 192.168.56.101:16010 hbase-2.1.0:master --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}"
bash nix.sh start 192.168.56.101:16010 hbase-2.1.0:master --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}
echo -e "\n==== bash nix.sh start 192.168.56.101:16030 hbase-2.1.0:regionserver --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}"
bash nix.sh start 192.168.56.101:16030 hbase-2.1.0:regionserver --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}
echo -e "\n==== bash nix.sh start 192.168.56.102:16030 hbase-2.1.0:regionserver --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}"
bash nix.sh start 192.168.56.102:16030 hbase-2.1.0:regionserver --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}
echo -e "\n==== bash nix.sh start 192.168.56.103:16030 hbase-2.1.0:regionserver --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}"
bash nix.sh start 192.168.56.103:16030 hbase-2.1.0:regionserver --zookeepers ${ZK_ALL} --hdfs.master ${HDFS_MASTER}
第4步: 测试集群
- 测试 zookeeper
[larluo@larluo-nixos:~/my-env]$ echo ruok | nc 192.168.56.101 2181
imok
[larluo@larluo-nixos:~/my-env]$ echo ruok | nc 192.168.56.102 2181
imok
[larluo@larluo-nixos:~/my-env]$ echo ruok | nc 192.168.56.103 2181
imok
- 测试hdfs
[larluo@larluo-nixos:~/my-env]$ su - op -c "/nix/store/p7wlb2b81dsw2kqjxnsrq4s62i8nn6xi-hadoop-3.1.1/bin/hdfs dfsadmin -fs hdfs://192.168.56.101:9000 -report"
Password:
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
Configured Capacity: 31499022336 (29.34 GB)
Present Capacity: 15335188603 (14.28 GB)
DFS Remaining: 15334603519 (14.28 GB)
DFS Used: 585084 (571.37 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.56.101:9866 (192.168.56.101)
Hostname: 192.168.56.101
Decommission Status : Normal
Configured Capacity: 10499674112 (9.78 GB)
DFS Used: 175484 (171.37 KB)
Non DFS Used: 4045742724 (3.77 GB)
DFS Remaining: 5094927957 (4.75 GB)
DFS Used%: 0.00%
DFS Remaining%: 48.52%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 13
Last contact: Sun Aug 26 14:37:35 CST 2018
Last Block Report: Sun Aug 26 14:15:08 CST 2018
Num of Blocks: 11
Name: 192.168.56.102:9866 (192.168.56.102)
Hostname: 192.168.56.102
Decommission Status : Normal
Configured Capacity: 10499674112 (9.78 GB)
DFS Used: 204800 (200 KB)
Non DFS Used: 4016508928 (3.74 GB)
DFS Remaining: 5124132437 (4.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 48.80%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 13
Last contact: Sun Aug 26 14:37:36 CST 2018
Last Block Report: Sun Aug 26 14:16:41 CST 2018
Num of Blocks: 11
Name: 192.168.56.103:9866 (192.168.56.103)
Hostname: 192.168.56.103
Decommission Status : Normal
Configured Capacity: 10499674112 (9.78 GB)
DFS Used: 204800 (200 KB)
Non DFS Used: 4025098240 (3.75 GB)
DFS Remaining: 5115543125 (4.76 GB)
DFS Used%: 0.00%
DFS Remaining%: 48.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 13
Last contact: Sun Aug 26 14:37:35 CST 2018
Last Block Report: Sun Aug 26 14:17:31 CST 2018
Num of Blocks: 11
- 测试hbase
hbase(main):001:0> create 'test', 'cf'
Created table test
Took 3.2068 seconds
=> Hbase::Table - test
hbase(main):002:0> list 'test'
TABLE
test
1 row(s)
Took 0.0540 seconds
=> ["test"]
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
Took 0.2722 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
Took 0.0136 seconds
hbase(main):005:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1535266220041, value=value1
row2 column=cf:b, timestamp=1535266224174, value=value2
2 row(s)
Took 0.1010 seconds
hbase(main):008:0>