第一篇文章,介绍了大数据下Hive
工具的主要功能,可以使用sql
语句来完成大数据计算任务,极大的方便了业务人员。
上一篇文章介绍了如何在Linux下搭建单机版Hadoop
环境,文本记录了在此基础上,搭建数仓工具Hive
的具体过程。
环境
系统环境 | 版本 |
---|---|
操作系统 | Ubuntu 18.04 |
Java | openjdk version "1.8.0_212" |
Hadoop | 3.2.0 |
Mysql | 5.7 |
Hive | 3.1.1 |
配置Mysql
配置Mysql
用于存储Hive
的MateData
sudo apt install -y mysql-server
- 新建
hive
用户
mysql> create user 'hive'@'localhost' identified by '123456';
mysql> grant all on *.* to 'hive'@'localhost' identified by '123456';
mysql> flush privileges;
- 验证
hive
用户
mysql> select host, user, authentication_string from mysql.user;
+-----------+------------------+-------------------------------------------+
| host | user | authentication_string |
+-----------+------------------+-------------------------------------------+
| localhost | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| localhost | mysql.session | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| localhost | mysql.sys | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| localhost | debian-sys-maint | *01B9179E564621D21BF72BE9D4D31F64EAD60870 |
| % | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| localhost | hive | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+-----------+------------------+-------------------------------------------+
6 rows in set (0.00 sec)
- 创建
hive
数据库
msyql -u hive -p
mysql> create database hive;
配置Hive
wget http://mirror.bit.edu.cn/apache/hive/hive-3.1.1/apache-hive-3.1.1-bin.tar.gz
- 配置环境变量
export HIVE_HOME=/home/hduser/hadoop-family/hive-3.1.1
export PATH=${PATH}:${HADOOP_HOME}/bin/:${HADOOP_HOME}/sbin/:${HIVE_HOME}/bin:${DERBY_HOME}/bin
- 配置
hive-site.xml
(4项)
在hive-3.1.1/conf
下默认没有hive-site.xml
需要cp hive-default.xml.template hive-site.xml
后,再修改以下内容,在这里&
需要被转义写成&
剔除3210
行的非法字符&8#;
<!--mysql connection configuration-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8&useSSL=false</value>
<description>JDBC connect string for a JDBC metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<!--hiveserver2 configuration-->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>localhost</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
新建目录mkdir -p /home/hduser/hadoop-family/hive-3.1.1/hive-iotmp/
再替换掉hive-site.xml
中,所有的${system:java.io.tmpdir}
与${system:user.name}
sed -i 's/${system:java.io.tmpdir}/\/home\/hduser\/hadoop-family\/hive-3.1.1\/hive_data\/tmp/g' hive-site.xml
sed -i 's/${system:user.name}/hduser/g' hive-site.xml
- 下载mysql-connector-java.jar,放入
hive-3.1.1/lib/
下
wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.47.tar.gz
配置Hadoop
修改hadoop-3.2.0/etc/hadoop/core-site.xml
,添加以下内容,主要原因是hadoop
引入了一个安全伪装机制,不允许上层系统直接将实际用户传递到hadoop
层,而是将实际用户传递给一个代理用户,由代理用户在hadoop
上执行操作,避免任意客户随意操作hadoop
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
在使用hive
创建table
之前,需要在haddop
中新建hive
的hive.metastore.warehouse.dir
,并添加g+w
权限
hdfs dfs -mkdir /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse
启动hive
在hive2.1
版本后,需要执行schematool
命名来执行初始化
schematool -dbType mysql -initSchema
- 创建数据库
hive> create database hive_db;
- 创建表
hive> use hive_db;
hive> create table hive_table (
> id int,
> name varchar(16),
> age int,
> message varchar(128)
> );
- 插入数据
hive
会将sql
指令转为MapReduce
任务去执行
hive> insert into hive_table values (1, 'xiaoxin', 16, 'I am wangxin'),(2, 'kunkun', 18, 'I am shikun');
Query ID = hduser_20190625042455_3600e443-663d-45f6-b842-37a3f2a8843b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1561428402986_0002, Tracking URL = http://ubuntu:8088/proxy/application_1561428402986_0002/
Kill Command = /home/hduser/hadoop-family/hadoop-3.2.0/bin/mapred job -kill job_1561428402986_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-06-25 04:25:04,263 Stage-1 map = 0%, reduce = 0%
2019-06-25 04:25:10,502 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.77 sec
2019-06-25 04:25:15,666 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.49 sec
MapReduce Total cumulative CPU time: 5 seconds 490 msec
Ended Job = job_1561428402986_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/hive_db.db/hive_table/.hive-staging_hive_2019-06-25_04-24-55_404_2009487560348382612-1/-ext-10000
Loading data to table hive_db.hive_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.49 sec HDFS Read: 18587 HDFS Write: 380 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 490 msec
OK
Time taken: 21.773 seconds
- 查询数据
hive> select * from hive_table;
OK
1 xiaoxin 16 I am wangxin
2 kunkun 18 I am shikun
Time taken: 0.174 seconds, Fetched: 2 row(s)
- 查看
hadoop
文件
在hdfs
中可以看到新建的db
文件
$ hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x - hduser supergroup 0 2019-06-25 04:22 /user/hive/warehouse/hive_db.db
启动Hiveserver2
hiveserver在hive2.0版本后,由于其不支持多客户端连接等功能被弃用,改用hiveserver2代替,hiveserver2有着自己的CLI工具beeline
- 启动hiveserver2
$ hiveserver2
2019-06-25 19:20:16: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hduser/hadoop-family/hive-3.1.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hduser/hadoop-family/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 2b252413-1672-4aee-943e-b1fac451ffeb
Hive Session ID = 4937ae43-4382-4333-ac1b-458851a55d45
Hive Session ID = ea183319-7ba4-4f3b-9d3b-d6ac6477aeca
Hive Session ID = 00343592-4acb-4d21-a2ec-af4ef765b2af
- 连接hiveserver2
# 本地连接
beeline -u jdbc:hive2://
# 远程连接
beeline -u jdbc:hive2://localhost:10000 -n hduser -p 123456
# 测试连接
beeline
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hduser
Enter password for jdbc:hive2://localhost:10000: 123456
- 查询数据
连接成功后,操作同hivesever1
0: jdbc:hive2://localhost:10000> select * from hive_db.hive_table;
INFO : Compiling command(queryId=hduser_20190625200727_652bdba6-9b35-4253-9955-9d20e9dca497): select * from hive_db.hive_table
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_table.id, type:int, comment:null), FieldSchema(name:hive_table.name, type:varchar(16), comment:null), FieldSchema(name:hive_table.age, type:int, comment:null), FieldSchema(name:hive_table.message, type:varchar(128), comment:null)], properties:null)
INFO : Completed compiling command(queryId=hduser_20190625200727_652bdba6-9b35-4253-9955-9d20e9dca497); Time taken: 0.171 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hduser_20190625200727_652bdba6-9b35-4253-9955-9d20e9dca497): select * from hive_db.hive_table
INFO : Completed executing command(queryId=hduser_20190625200727_652bdba6-9b35-4253-9955-9d20e9dca497); Time taken: 0.0 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+----------------+------------------+-----------------+---------------------+
| hive_table.id | hive_table.name | hive_table.age | hive_table.message |
+----------------+------------------+-----------------+---------------------+
| 1 | xiaoxin | 16 | I am wangxin |
| 2 | kunkun | 18 | I am shikun |
+----------------+------------------+-----------------+---------------------+
2 rows selected (0.214 seconds)