CentOS 7.2 安装Hive和Hive使用札记

Hive由Facebook开源,是一个构建在Hadoop之上的数据仓库将结构化的数据映射成表支持类SQL查询,Hive中称为HQL无法实时更新,只支持向现有表中追加数据。


Hive原理图.png

Hive常用文件格式类型

TEXTFILE

  • 默认文件格式,建表时用户需要显示指定分隔符
  • 存储方式:行存储

SequenceFile

  • 二进制键值对序列化文件格式
  • 存储方式:行存储

列式存储格式

  • RCFILE/ORC
  • 存储方式:列存储

常用数据类型

1.整数类型

  • SMALLINT、INT、BIGINT
  • 空间占用分别是1字节、2字节、4字节、8字节

2.浮点类型

  • DOUBLE
  • 空间占用分别是32位和64位浮点数

3. 布尔类型BOOLEAN

  • 用于存储true和false

4.字符串文本类型STRING

  • 存储变长字符串,对类型长度没有限制

5.时间戳类型TIMESTAMP

  • 存储精度为纳秒的时间戳

复杂数据类型

1.ARRAY

  • 存储相同类型的数据,可以通过下标获取数据
  • 定义:ARRAY<STRING>
  • 查询:array[index]

2.MAP

  • 存储键值对数据,键或者值的类型必须相同,通过键获取值。
  • 定义:MAP<STRING,INT>
  • 查询:map[‘key’]

3.STRUCT

  • 可以存储多种不同的数据类型,一旦声明好结构,各字段的位置不能够改变。
  • 定义:STRUCT<city:STRING, address :STRING,door_num:STRING>
  • 查询:struct.fieldname

一、Hive的安装

1.下载Hive安装包并解压

[hadoop@hadoop01 apps]$ tar -zxvf apache-hive-1.2.2-bin.tar.gz 

2.使用Root用户创建软链接

[root@hadoop01 apps]# ln -s /home/hadoop/apps/apache-hive-1.2.2-bin /usr/local/hive

3.为Hive指定用户组

[root@hadoop01 apps]# chown -R hadoop:hadoop /usr/local/hive

4. 添加Hive到系统环境变量并生效

[root@hadoop01 apps]# vim /etc/profile

添加环境变量内容为:

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:${JAVA_HOME}/bin:${ZOOKEEPER_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
:${HIVE_HOME}/bin

生效环境变量

[root@hadoop01 apps]# source /etc/profile

5.配置Hive的默认metastore

修改Hive配置目录下的hive-site.xml配置文件,编辑内容如下:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://192.168.43.50:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hadoop</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>xxxx</value>
        <description>password to use against metastore database</description>
    </property>
</configuration>

说明:数据库连接地址为Mysql地址,且所配置用户具有外网访问数据库权限,ConnectionPassword配置成个人Mysql数据库用户密码

二、Hive的使用

1.运行hive

[hadoop@hadoop01 ~]$ hive

Logging initialized using configuration in jar:file:/home/hadoop/apps/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties

2.查看数据库

hive> show databases;
OK
default
Time taken: 0.99 seconds, Fetched: 1 row(s)

3.创建用户表: user_info

字段信息

字段名称 字段类型
用户id string
地域id string
年龄 int
职业 string
create table user_info(
user_id string,
area_id string,
age int,
occupation string
)
row format delimited fields terminated by '\t' 
lines terminated by '\n'
stored as textfile; 

4.查看表

查看default库中的表,发现新建的user_info表在default库中

hive> show tables;
OK
user_info
Time taken: 0.04 seconds, Fetched: 1 row(s)

查看对应文件目录信息

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 19:48 /user/hive/warehouse/user_info

5.hive删除表

删除user_info表,user_info表在hdfs的目录也会被同时删除

hive> drop table user_info;
OK
Time taken: 0.935 seconds
hive> show tables;
OK
Time taken: 0.041 seconds

查看文件目录位置

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse
[hadoop@hadoop01 root]$ 

6.创建数据库,用于存储维度

hive> create database rel;
OK
Time taken: 0.098 seconds
hive> show databases;
OK
default
rel
Time taken: 0.025 seconds, Fetched: 2 row(s)

查看对应文件目录信息:

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 19:55 /user/hive/warehouse/rel.db

7.创建内部管理表

在数据库rel中创建学生信息表,字段信息:学号、姓名、年龄、地域。切换使用rel数据库:

use rel;
create table student_info(
student_id string comment '学号',
name string comment '姓名',
age int comment '年龄',
origin string comment '地域'
)
comment '学生信息表'
row format delimited 
fields terminated by '\t' 
lines terminated by '\n'
stored as textfile;

查看对应目录信息

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse/rel.db
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 19:59 /user/hive/warehouse/rel.db/student_info

8.使用load从本地加载数据到表

使用load从本地加载数据到表student_info

hive> load data local inpath '/home/hadoop/apps/hive_test_data/student_info_data.txt' into table student_info;
Loading data to table rel.student_info
Table rel.student_info stats: [numFiles=1, totalSize=341]
OK
Time taken: 1.144 seconds

查看student_info 表信息和对应文件路径

hive> select * from student_info;
OK
1   xiaoming    20  11
2   xiaobai 21  31
3   zhangfei    22  44
4   likui   19  44
5   zhaoyun 21  13
6   zhangsan    20  11
7   lisi    19  11
8   wangwu  23  31
9   zhaofei 19  21
10  zhangyan    20  21
11  lihe    20  22
12  caoyang 17  32
13  lihao   19  32
14  zhaoming    21  50
15  zhouhong    18  51
16  yangshuo    23  33
17  xiaofei 24  13
18  liman   23  13
19  qianbao 20  13
20  sunce   21  41
Time taken: 0.767 seconds, Fetched: 20 row(s)

查看对应文件夹路径信息

[hadoop@hadoop01 hive_test_data]$ hadoop fs -ls /user/hive/warehouse/rel.db/student_info
Found 1 items
-rwxr-xr-x   3 hadoop supergroup        341 2018-01-14 20:09 /user/hive/warehouse/rel.db/student_info/student_info_data.txt

9.使用load从HDFS上加载数据到表student_info

先删除原有数据文件

[hadoop@hadoop01 hive_test_data]$ hadoop fs -rm -f /user/hive/warehouse/rel.db/student_info/student_info_data.txt
18/01/14 20:15:31 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/hive/warehouse/rel.db/student_info/student_info_data.txt

将本地文件上传到HDFS根目录下

[hadoop@hadoop01 hive_test_data]$ hadoop fs -put /home/hadoop/apps/hive_test_data/student_info_data.txt /
[hadoop@hadoop01 hive_test_data]$ hadoop fs -ls /
Found 6 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 16:23 /addata
drwxr-xr-x   - hadoop supergroup          0 2017-12-23 20:20 /data
-rw-r--r--   3 hadoop supergroup        341 2018-01-14 20:16 /student_info_data.txt
drwxrwx---   - hadoop supergroup          0 2018-01-14 17:26 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-01-14 19:48 /user
drwxr-xr-x   - hadoop supergroup          0 2018-01-13 16:26 /wordcount

使用load将HDFS文件加载到student_info 表中

hive> load data inpath '/student_info_data.txt' into table student_info;
Loading data to table rel.student_info
Table rel.student_info stats: [numFiles=1, totalSize=341]
OK
Time taken: 0.602 seconds
hive> select * from student_info;
OK
1   xiaoming    20  11
2   xiaobai 21  31
3   zhangfei    22  44
4   likui   19  44
5   zhaoyun 21  13
6   zhangsan    20  11
7   lisi    19  11
8   wangwu  23  31
9   zhaofei 19  21
10  zhangyan    20  21
11  lihe    20  22
12  caoyang 17  32
13  lihao   19  32
14  zhaoming    21  50
15  zhouhong    18  51
16  yangshuo    23  33
17  xiaofei 24  13
18  liman   23  13
19  qianbao 20  13
20  sunce   21  41
Time taken: 0.143 seconds, Fetched: 20 row(s)

采用覆盖重写方式加载文件到student_info 表中

  • 原hdfs根目录下的student_info_data.txt已经被剪切到student_info表的hdfs路径下/user/hive/warehouse/rel.db/student_info
hive> load data inpath '/student_info_data.txt' overwrite into table student_info;
Loading data to table rel.student_info
Table rel.student_info stats: [numFiles=1, numRows=0, totalSize=341, rawDataSize=0]
OK
Time taken: 0.41 seconds

10.Hive的数据类型

字段名 类型 注释
user_id string 用户ID
salary int 工资
worked_citys array<string> 工作过的城市
social_security map<string,float> 社保缴费情况(养老,医保)
wealfare struct<meal_allowance:float,if_regular:boolean,commerical_insurance:float> 福利(吃饭补助(float),是否转正(boolean),商业保险(float)

创建员工表

hive> create table rel.employee(
    > user_id string,
    > salary int,
    > worked_citys array<string>,
    > social_security map<string,float>,
    > welfare struct<meal_allowance:float,if_regular:boolean,commercial_insurance:float>
    > )
    > row format delimited fields terminated by '\t' 
    > collection items terminated by ','
    > map keys terminated by ':'
    > lines terminated by '\n'
    > stored as textfile;
OK
Time taken: 0.212 seconds
hive> show tables;
OK
employee
student_info
Time taken: 0.057 seconds, Fetched: 2 row(s)

从本地加载数据到表employee

hive> load data local inpath '/home/hadoop/apps/hive_test_data/employee_data.txt' into table employee;
Loading data to table rel.employee
Table rel.employee stats: [numFiles=1, totalSize=206]
OK
Time taken: 0.388 seconds
hive> select * from employee;
OK
zhangsan    10800   ["beijing","shanghai"]  {"养老":1000.0,"医疗":600.0}    {"meal_allowance":2000.0,"if_regular":true,"commercial_insurance":500.0}
lisi    20000   ["beijing","nanjing"]   {"养老":2000.0,"医疗":1200.0}   {"meal_allowance":2000.0,"if_regular":false,"commercial_insurance":500.0}
wangwu  17000   ["shanghai","nanjing"]  {"养老":1800.0,"医疗":1100.0}   {"meal_allowance":2000.0,"if_regular":true,"commercial_insurance":500.0}
Time taken: 0.127 seconds, Fetched: 3 row(s)

查询已转正的员工编号,工资,工作过的第一个城市,社保养老缴费情况,福利餐补金额

select user_id,
salary,
worked_citys[0],
social_security['养老'],
welfare.meal_allowance 
from rel.employee
where welfare.if_regular=true;

11.创建外部表 【常用】

字段名 字段类型 字段注释
student_id string 学生ID
name string 学生姓名
institute_id string 学院ID
major_id string 专业ID
school_year string 入学年份

可以提前创建好hdfs路径
hadoop mkdir -p /user/hive/warehouse/data/student_school_info
如果没有提前创建好,在创建外部表的时候会根据指定路径自动创建

字段名 字段类型 字段注释
student_id string 学生ID
name string 学生姓名
institute_id string 学院ID
major_id string 专业ID
school_year string 入学年份
create external table rel.student_school_info(
student_id string,
name string,
institute_id string,
major_id string,
school_year string
)
row format delimited 
fields terminated by '\t' 
lines terminated by '\n'
stored as textfile
location '/user/hive/warehouse/data/student_school_info';

查看对应文件目录

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse/data/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-15 14:08 /user/hive/warehouse/data/student_school_info

上传本地数据文件到hdfs

[hadoop@hadoop01 root]$ hadoop fs -put /home/hadoop/apps/hive_test_data/student_school_info_external_data.txt /user/hive/warehouse/data/student_school_info/

12.创建内部分区表

字段名称 类型 注释
studnet_id string 学号
name string 姓名
institute_id string 学院ID
major_id string 专业ID
create table student_school_info_partition(
student_id string,
name string,
institute_id string,
major_id string
)
partitioned by(school_year string) 
row format delimited
fields terminated by '\t' 
lines terminated by '\n'
stored as textfile; 

使用insert into从student_school_info表将2017年入学的学籍信息导入到student_school_info_partition分区表中

insert into table student_school_info_partition partition(school_year='2017')
select t1.student_id,t1.name,t1.institute_id,t1.major_id
from student_school_info t1
where t1.school_year=2017;

13.查看分区

hive> show partitions student_school_info_partition;
OK
school_year=2017
Time taken: 0.191 seconds, Fetched: 1 row(s)

查看hdfs路径

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse/rel.db/student_school_info_partition/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-16 13:20 /user/hive/warehouse/rel.db/student_school_info_partition/school_year=2017

查询student_school_info_partition

hive> select * from student_school_info_partition where school_year='2017';
OK
1   xiaoming    information software    2017
2   xiaobai information computer    2017
3   zhangfei    information computer    2017
4   likui   information bigdata 2017
5   zhaoyun information bigdata 2017
6   zhangsan    information software    2017
7   lisi    information bigdata 2017
8   wangwu  information computer    2017
Time taken: 0.226 seconds, Fetched: 8 row(s)

14.删除分区

hive> alter table student_school_info_partition drop partition (school_year='2017');
Dropped the partition school_year=2017
OK
Time taken: 0.71 seconds

15.使用动态分区添加数据

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table student_school_info_partition partition(school_year)
select t1.student_id,t1.name,t1.institute_id,t1.major_id,t1.school_year
from student_school_info t1

查看分区

hive> show partitions student_school_info_partition;
OK
school_year=2017
Time taken: 0.12 seconds, Fetched: 1 row(s)

查看hdfs路径

[hadoop@hadoop01 root]$ hadoop fs -ls /user/hive/warehouse/rel.db/student_school_info_partition/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-01-16 13:27 /user/hive/warehouse/rel.db/student_school_info_partition/school_year=2017

15.创建外部分区表【常用】

create external table rel.student_school_info_external_partition(
student_id string,
name string,
institute_id string,
major_id string
)
partitioned by(school_year string) 
row format delimited 
fields terminated by '\t' 
lines terminated by '\n'
stored as textfile
location '/user/hive/warehouse/data/student_school_info_external_partition';

在分区表的hdfs路径中添加school_year=2017目录

hadoop fs -mkdir /user/hive/warehouse/data/student_school_info_external_partition/school_year=2017

将student_school_external_partition_data.txt文件上传到school_year=2017文件夹下

hadoop fs -put student_school_external_partition_data.txt /user/hive/warehouse/data/student_school_info_external_partition/school_year=2017

虽然数据已经添加到了分区对应的hdfs路径,但是表还没有添加分区,所以查询的时候没有数据

手动添加分区

hive> alter table student_school_info_external_partition add partition(school_year='2017');
OK
Time taken: 0.111 seconds
hive> select * from student_school_info_external_partition;
OK
1   xiaoming    information software    2017
2   xiaobai information computer    2017
3   zhangfei    information computer    2017
4   likui   information bigdata 2017
5   zhaoyun information bigdata 2017
6   zhangsan    information software    2017
7   lisi    information bigdata 2017
8   wangwu  information computer    2017
Time taken: 0.127 seconds, Fetched: 8 row(s)

删除分区

hive> alter table student_school_info_external_partition drop partition(school_year='2017');
Dropped the partition school_year=2017
OK
Time taken: 0.19 seconds

查看分区,分区已经被删除

hive> show partitions student_school_info_external_partition;
OK
Time taken: 0.168 seconds

查看hdfs分区数据,分区数据还在

[hadoop@hadoop01 hive_test_data]$ hadoop fs -ls /user/hive/warehouse/data/student_school_info_external_partition/school_year=2017
Found 1 items
-rw-r--r--   3 hadoop supergroup        250 2018-01-16 13:33 /user/hive/warehouse/data/student_school_info_external_partition/school_year=2017/student_school_external_partition_data.txt
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,383评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,522评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,852评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,621评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,741评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,929评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,076评论 3 410
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,803评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,265评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,582评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,716评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,395评论 4 333
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,039评论 3 316
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,798评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,027评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,488评论 2 361
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,612评论 2 350

推荐阅读更多精彩内容