五、安装Hbase
Hbase:基于 HDFS 的实时 KV/宽表数据库
Hbase与Hadoop版本对照表,参考:Apache HBase® Reference Guide
1753838588517.png
5.1 下载解压Hbase
# 上传文件到服务器
tar zxf hbase-2.3.0-bin.tar.gz -C /usr/local
mv /usr/local/hbase-2.3.0/ /usr/local/hbase
5.2 配置环境变量(~/.bashrc)
# 在文件末尾添加如下内容:
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
# 应用一次
source ~/.bashrc
# 查看版本
hbase version
.......HBase 2.3.0
5.3 修改配置
- hbase-env.sh
cd /usr/local/hbase/conf
vi hbase-env.sh # 添加如下内容:
export JAVA_HOME=/usr/local/jdk1.8.0_211
export HBASE_CLASSPATH=/usr/local/hadoop/conf
export HBASE_MANAGES_ZK=true # 使用内置ZooKeeper
- hbase-site.xml:
vi hbase-site.xml # 确保有如下属性配置
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://xx-bigdata-server:9000/hbase</value> # 指向HDFS路径
</property>
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value> # 启用Schema映射
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!-- 关键:让 ZooKeeper 监听所有网卡 -->
<property>
<name>hbase.zookeeper.property.clientPortAddress</name>
<value>0.0.0.0</value>
</property>
<!-- 关闭 stream capability 检查,防止 Master 启动失败 -->
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<!-- 启用ruok,stat,mntr,conf,isro4个命令 -->
<property>
<name>hbase.zookeeper.property.4lw.commands.whitelist</name>
<value>ruok,stat,mntr,conf,isro</value>
</property>
</configuration>
其中,hbase.rootdir项端口号与hadoop的core-site.xml中的fs.defaultFS项保持一致
5.4 启动验证
start-hbase.sh
jps | grep HMaster
hbase shell # 进入交互命令行
> create 'test', 'cf' # 创建测试表
> list # 查看表
访问:http://192.168.1.157:16010/
image.png
六、安装Phoenix
Phoenix: 给 HBase 加 JDBC/SQL 外壳
Phoenix与Hbase的对应关系从这里查看:http://phoenix.apache.org/download.html
与Hbase2.3对应的是Phoenix5.1.3,建议从国内站下载:apache-phoenix-phoenix-5.1.3安装包下载_开源镜像站-阿里云
tar zxvf phoenix-hbase-2.3-5.1.3-bin.tar.gz -C /usr/local/
mv /usr/local/phoenix-hbase-2.3-5.1.3-bin /usr/local/phoenix
vi ~/.bashrc # 添加一条
export PHOENIX_HOME=/usr/local/phoenix
source ~/.bashrc
cp /usr/local/phoenix/phoenix-server-hbase-2.3.jar $HBASE_HOME/lib/
vi hbase-site.xml # 确保有如下属性
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value> # 启用Schema映射
</property>
# 重启hbase
stop-hbase.sh && start-hbase.sh
# 解决系统中python版本的问题
ln -s /usr/bin/python3 /usr/bin/python
# 启动Phoenix CLI
$PHOENIX_HOME/bin/sqlline.py localhost
> CREATE TABLE test_phx (id INTEGER PRIMARY KEY, name VARCHAR);
> UPSERT INTO test_phx VALUES (1, 'phoenix_test');
> SELECT * FROM test_phx;
+----+--------------+
| ID | NAME |
+----+--------------+
| 1 | phoenix_test |
+----+--------------+
1 row selected (0.047 seconds)
> !quit # 退出
七、安装Hive
把 HDFS 文件当“表”,用 SQL 查询
1 基本安装
从这里查看Hive与Hadoop之间的对应关系:Downloads
tar zxvf apache-hive-2.3.7-bin.tar.gz -C /usr/local/
cd /usr/local/ && mv apache-hive-2.3.7-bin/ hive
vi ~/.bashrc # 添加一条
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
source ~/.bashrc
# 初始化原数据
schematool -dbType derby -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
......
Initialization script completed
schemaTool completed
# 验证一下
hive # 启动CLI
hive> CREATE TABLE demo (id INT, name STRING); # 建表测试
hive> INSERT INTO demo VALUES (1, 'hive_test'); # 需启用MapReduce
7.2 将Hive元数据存储到MySql数据库
- MySQL 准备阶段
MySql数据库IP: 192.168.1.156 , 登录数据库后,执行如下操作:
> CREATE DATABASE hive_metadata;
> CREATE USER 'hive_user'@'%' IDENTIFIED BY 'xxxxxx';
> GRANT ALL PRIVILEGES ON hive_metadata.* TO 'hive_user'@'%';
> FLUSH PRIVILEGES;
- Hive 配置修改
vi $HIVE_HOME/conf/hive-site.xml # 添加以下内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.156:3306/hive_metadata?createDatabaseIfNotExist=true&useSSL=false</value> <!--&是&的转义-->
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive_user</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>xxxxxx</value>
</property>
</configuration>
- 放置 MySQL JDBC 驱动
MySql版本为5.7 ( MySQL 5.7 用 5.1.x,MySQL 8.0 用 8.0.x)
cp mysql-connector-java-5.1.28.jar $HIVE_HOME/lib/
- 初始化元数据库
schematool -dbType mysql -initSchema # 类型为mysql
SLF4J: Class path contains multiple SLF4J bindings.
......
Initialization script completed
schemaTool completed
7.3验证配置
hive -e "CREATE DATABASE test_db; SHOW DATABASES;" # 得到输出:
SLF4J: Class path contains multiple SLF4J bindings.
......
OK
Time taken: 4.292 seconds
OK
default
test_db
Time taken: 0.142 seconds, Fetched: 2 row(s)
# 再转到MySql服务器上查询元数据信息
> select DB_ID,NAME,OWNER_NAME,OWNER_TYPE from hive_metadata.DBS ;
+-------+---------+------------+------------+
| DB_ID | NAME | OWNER_NAME | OWNER_TYPE |
+-------+---------+------------+------------+
| 1 | default | public | ROLE |
| 2 | test_db | root | USER |
+-------+---------+------------+------------+
2 rows in set (0.00 sec)
可以看到,hive的元数据已经成功存储到MySql中了
7.4 启动 Hive WebUI
# 先把 Hadoop/YARN 启动
start-dfs.sh
start-yarn.sh
# 启动 HiveServer2 并显式打开 WebUI
hive --service hiveserver2 \
--hiveconf hive.server2.webui.port=10002 \
--hiveconf hive.server2.webui.max.threads=50 \
> hiveserver2.log 2>&1 &
再访问:http://192.168.1.157:10002/
image.png