大数据开发之Hive篇4-Hive数据操作语言

备注:
Hive 版本 2.1.1

一.Hive的DML(数据操作语言)概述

Hive的DML语句继承了传统的关系型数据库,在此之上还有自己独特的语句。
数据仓库实施的过程中,经常会使用到DML,用于写业务逻辑。

语句 描述 备注
Load 命令 向表中加载数据 可以是操作系统文件,也可以是HDFS上的文件
INSERT INTO TABLE FROM Query 查询结构插入表或分区 正式环境使用最频繁的语句
CTAS 查询结果直接创建表 省略了建表语句,直接将数据录入自动创建的表
Writing data into the filesystem from queries 查询结果可以插入到文件系统目录中 将hive表的数据加载到文件
INSERT 标准的insert 语句录入数据 insert values 标准的录入数据语句
Update Update语句,更新数据 需要开启事务
Delete Delete语句,删除数据 需要开启事务
Merge Merge语句,存在update,不存在insert 需要开始事务,Hive 2.2版本开始支持

二.Load 命令

语法:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
 
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

LOAD操作是一个简单的文件系统的move/copy操作,不起MR Job
如果表是分区表,比如指定要加载到的确定的分区(指定所有分区字段的值)
filepath可以是文件,也可以是目录。如果是文件,则将该文件move/copy到相应的路径下,如果是目录,则将该目录下的所有文件move/copy到相应的路径下;如果是目录,不可包含子目录
如果指定LOCAL关键字,则指本地文件copy到相应的路径下;如果不指定LOCAL,则是将默认HDFS上的文件move到相应的路径下
如果指定OVERWRITE关键字,则先先删除掉表或者分区的内容,将filepath下的内容替换;否则,将filepath下的内容添加到表或者分区路径下

2.1 数据准备

人工造数据:
空值 有null、有''、有直接为空的

[root@hp1 hive]# more test1.txt 
1,'abc','2020-01-01'
2,'def','2020-01-01 23:00:01'
3,'ghi','2020-01-02 19:00:10'
4,'aaa','2020-01-02 19:00:20'
5,,'2020-01-02 19:00:30'
6,'abc',
7,'def','2020-01-03 19:00:30'
8,'abc','2020-02-02 19:00:30'

将test1.txt上传到hdfs

[root@hp1 hive]# hadoop fs -put test1.txt /tmp
[root@hp1 hive]# 
[root@hp1 hive]# hadoop fs -ls /tmp
Found 6 items
d---------   - hdfs   supergroup          0 2020-12-03 18:16 /tmp/.cloudera_health_monitoring_canary_files
drwxrwxrwt   - root   supergroup          0 2020-12-03 11:15 /tmp/db_test3.db
drwxrwxrwt   - root   supergroup          0 2020-12-03 11:16 /tmp/db_test4.db
drwx-wx-wx   - hive   supergroup          0 2020-11-18 17:57 /tmp/hive
drwxrwxrwt   - mapred hadoop              0 2020-11-19 15:36 /tmp/logs
-rw-rw-rw-   3 root   supergroup        166 2020-12-03 18:16 /tmp/test1.txt
[root@hp1 hive]# 

hive表

CREATE TABLE t2(
  user_id int, 
  name varchar(100) COMMENT '用户姓名', 
  loagin_date string);

2.2 将服务器文件加载到hive表

LOAD DATA LOCAL INPATH '/backup/hive/test1.txt' into table t2;

导入数据1
测试记录:
可以看到都是空格

hive> 
    > LOAD DATA LOCAL INPATH '/backup/hive/test1.txt' into table t2; 
Loading data to table test.t2
OK
Time taken: 0.867 seconds
hive> select * from t2;
OK
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
NULL    NULL    NULL
Time taken: 0.446 seconds, Fetched: 9 row(s)

更改表的分隔符

drop table t2;
CREATE TABLE t2(
  user_id int, 
  name varchar(100) COMMENT '用户姓名', 
  loagin_date string) row format delimited fields terminated by ',' stored as textfile;

导入数据2
可以看到数值和字符类型是正常导入了,但是时间类型没有正常导入

hive> 
    > 
    > LOAD DATA LOCAL INPATH '/backup/hive/test1.txt' into table t2; 
Loading data to table test.t2
OK
Time taken: 0.632 seconds
hive> select * from t2;
OK
1       'abc'   '2020-01-01'
2       'def'   '2020-01-01 23:00:01'
3       'ghi'   '2020-01-02 19:00:10'
4       'aaa'   '2020-01-02 19:00:20'
5               '2020-01-02 19:00:30'
6       'abc'
7       'def'   '2020-01-03 19:00:30'
8       'abc'   '2020-02-02 19:00:30'
Time taken: 0.073 seconds, Fetched: 8 row(s)

2.3 将HDFS文件加载到hive表

与操作系统文件相比,就是少了个local关键字,其余相同。

create table t3 like t2;
LOAD DATA INPATH '/tmp/test1.txt' into table t3;

测试记录:

hive> 
    > create table t3 like t2;
OK
Time taken: 0.078 seconds
hive> 
    > LOAD DATA  INPATH '/tmp/test1.txt' into table t3;
Loading data to table test.t3
OK
Time taken: 0.605 seconds
hive> select * from t3;
OK
1       'abc'   '2020-01-01'
2       'def'   '2020-01-01 23:00:01'
3       'ghi'   '2020-01-02 19:00:10'
4       'aaa'   '2020-01-02 19:00:20'
5               '2020-01-02 19:00:30'
6       'abc'
7       'def'   '2020-01-03 19:00:30'
8       'abc'   '2020-02-02 19:00:30'
Time taken: 0.064 seconds, Fetched: 8 row(s)
hive> 

三.INSERT INTO TABLE FROM Query

语法:

Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
 
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

如果表是分区表,比如指定要加载到的确定的分区(指定所有分区字段的值)
如果指定OVERWRITE关键字,则先先删除掉表或者分区的内容,将查询结果的内容替换;否则,将查询结果添加到表或者分区路径下
如果指定OVERWRITE关键字,则先先删除掉表或者分区的内容,将查询结果的内容替换;否则,将查询结果添加到表或者分区路径下

3.1 Insert into select 语句

insert into select 语句同关系型数据的语法,用于从表中加载数据。

create table t4 like t3;
insert into t4 select user_id,name,loagin_date from t3;
select * from t4;

测试记录:

hive> 
    > 
    > create table t4 like t3;
OK
Time taken: 0.091 seconds
hive> insert into t4 select user_id,name,loagin_date from t3;
Query ID = root_20201204105325_8e9447df-993b-4b9c-b347-fd0c60382987
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0021, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0021/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0021
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-04 10:53:33,921 Stage-1 map = 0%,  reduce = 0%
2020-12-04 10:53:39,158 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.78 sec
MapReduce Total cumulative CPU time: 1 seconds 780 msec
Ended Job = job_1606698967173_0021
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/t4/.hive-staging_hive_2020-12-04_10-53-25_821_7213321429457977661-1/-ext-10000
Loading data to table test.t4
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.78 sec   HDFS Read: 4468 HDFS Write: 269 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 780 msec
OK
Time taken: 16.067 seconds
hive> 
    > select * from t4;
OK
1       'abc'   '2020-01-01'
2       'def'   '2020-01-01 23:00:01'
3       'ghi'   '2020-01-02 19:00:10'
4       'aaa'   '2020-01-02 19:00:20'
5               '2020-01-02 19:00:30'
6       'abc'
7       'def'   '2020-01-03 19:00:30'
8       'abc'   '2020-02-02 19:00:30'
Time taken: 0.066 seconds, Fetched: 8 row(s)

3.2 Insert overwrite select 语句

Insert overwrite table select 功能是先清空表,然后在将select的语句录入到表
这个语句是hive的原创,数据仓库中全量更新数据的时候经常会使用到。
语法:

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

代码:

-- 延续上例 再insert 一次
insert into t4 select user_id,name,loagin_date from t3;
-- 查询数据发现重复了
select * from t4;
-- 改成 insert overwrite table命令
insert overwrite table t4 select user_id,name,loagin_date from t3;
-- 再次验证数据
select * from t4;

测试记录:

hive> 
    > insert into t4 select user_id,name,loagin_date from t3;
Query ID = root_20201204110806_8624193d-6311-44f5-8575-2abeb6c9e102
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0022, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0022/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-04 11:08:14,896 Stage-1 map = 0%,  reduce = 0%
2020-12-04 11:08:22,132 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.72 sec
MapReduce Total cumulative CPU time: 1 seconds 720 msec
Ended Job = job_1606698967173_0022
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/t4/.hive-staging_hive_2020-12-04_11-08-06_488_865341941305657171-1/-ext-10000
Loading data to table test.t4
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.72 sec   HDFS Read: 4552 HDFS Write: 269 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 720 msec
OK
Time taken: 17.314 seconds
hive> select * from t4;
OK
1       'abc'   '2020-01-01'
2       'def'   '2020-01-01 23:00:01'
3       'ghi'   '2020-01-02 19:00:10'
4       'aaa'   '2020-01-02 19:00:20'
5               '2020-01-02 19:00:30'
6       'abc'
7       'def'   '2020-01-03 19:00:30'
8       'abc'   '2020-02-02 19:00:30'
1       'abc'   '2020-01-01'
2       'def'   '2020-01-01 23:00:01'
3       'ghi'   '2020-01-02 19:00:10'
4       'aaa'   '2020-01-02 19:00:20'
5               '2020-01-02 19:00:30'
6       'abc'
7       'def'   '2020-01-03 19:00:30'
8       'abc'   '2020-02-02 19:00:30'
Time taken: 0.066 seconds, Fetched: 16 row(s)
hive> 
    > insert overwrite table t4 select user_id,name,loagin_date from t3;
Query ID = root_20201204110931_49346b1e-779f-4b4f-830f-7f363b2855c9
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0023, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0023/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0023
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-04 11:09:38,356 Stage-1 map = 0%,  reduce = 0%
2020-12-04 11:09:43,517 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.75 sec
MapReduce Total cumulative CPU time: 1 seconds 750 msec
Ended Job = job_1606698967173_0023
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/t4/.hive-staging_hive_2020-12-04_11-09-31_276_442497908099823168-1/-ext-10000
Loading data to table test.t4
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.75 sec   HDFS Read: 4557 HDFS Write: 269 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 750 msec
OK
Time taken: 14.942 seconds
hive> select * from t4;
OK
1       'abc'   '2020-01-01'
2       'def'   '2020-01-01 23:00:01'
3       'ghi'   '2020-01-02 19:00:10'
4       'aaa'   '2020-01-02 19:00:20'
5               '2020-01-02 19:00:30'
6       'abc'
7       'def'   '2020-01-03 19:00:30'
8       'abc'   '2020-02-02 19:00:30'
Time taken: 0.066 seconds, Fetched: 8 row(s)

3.3 multiple inserts

语法:

Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;

需求:
我要将员工表有奖金和没有奖金的分开2个表存储
一般情况下需要2个insert语句,有了 multiple insert之后,一个语句就可以搞定

代码:

create table emp_bonus like emp;
create table emp_no_bonus like emp;

from emp e
insert into emp_bonus
select empno,ename,job,mgr,hiredate,sal,comm,deptno 
 where comm > 0
insert into emp_no_bonus
select empno,ename,job,mgr,hiredate,sal,comm,deptno 
 where comm = 0 or comm is null;

select * from emp_bonus;
select * from emp_no_bonus;

测试记录:

hive> 
    > create table emp_bonus like emp;
OK
Time taken: 0.078 seconds
hive> create table emp_no_bonus like emp;
OK
Time taken: 0.095 seconds
hive> 
    > from emp e
    > insert into emp_bonus
    > select empno,ename,job,mgr,hiredate,sal,comm,deptno 
    >  where comm > 0
    > insert into emp_no_bonus
    > select empno,ename,job,mgr,hiredate,sal,comm,deptno 
    >  where comm = 0 or comm is null;
Query ID = root_20201204113103_98083a55-2d8a-44d5-85fa-67af5369a21d
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0024, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0024/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0024
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
2020-12-04 11:31:11,214 Stage-2 map = 0%,  reduce = 0%
2020-12-04 11:31:17,437 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 6.82 sec
MapReduce Total cumulative CPU time: 6 seconds 820 msec
Ended Job = job_1606698967173_0024
Stage-5 is filtered out by condition resolver.
Stage-4 is selected by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is filtered out by condition resolver.
Stage-10 is selected by condition resolver.
Stage-12 is filtered out by condition resolver.
Launching Job 4 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0025, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0025/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0025
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 0
2020-12-04 11:31:29,467 Stage-4 map = 0%,  reduce = 0%
2020-12-04 11:31:35,647 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 1.53 sec
MapReduce Total cumulative CPU time: 1 seconds 530 msec
Ended Job = job_1606698967173_0025
Launching Job 5 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0026, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0026/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0026
Hadoop job information for Stage-10: number of mappers: 1; number of reducers: 0
2020-12-04 11:31:47,670 Stage-10 map = 0%,  reduce = 0%
2020-12-04 11:31:53,852 Stage-10 map = 100%,  reduce = 0%, Cumulative CPU 1.57 sec
MapReduce Total cumulative CPU time: 1 seconds 570 msec
Ended Job = job_1606698967173_0026
Loading data to table test.emp_bonus
Loading data to table test.emp_no_bonus
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 2   Cumulative CPU: 6.82 sec   HDFS Read: 15263 HDFS Write: 965 HDFS EC Read: 0 SUCCESS
Stage-Stage-4: Map: 1   Cumulative CPU: 1.53 sec   HDFS Read: 3180 HDFS Write: 157 HDFS EC Read: 0 SUCCESS
Stage-Stage-10: Map: 1   Cumulative CPU: 1.57 sec   HDFS Read: 3568 HDFS Write: 520 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 920 msec
OK
Time taken: 52.03 seconds
hive> select * from emp_bonus;
OK
7521    WARD    SALESMAN        7698    1981-02-22      1250.0  500.0   30
7654    MARTIN  SALESMAN        7698    1981-09-28      1250.0  1400.0  30
7499    ALLEN   SALESMAN        7698    1981-02-20      1600.0  300.0   30
Time taken: 0.081 seconds, Fetched: 3 row(s)
hive> 
    > select * from emp_no_bonus;
OK
7566    JONES   MANAGER 7839    1981-04-02      2975.0  NULL    20
7698    BLAKE   MANAGER 7839    1981-05-01      2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-06-09      2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-06-13      3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-09-08      1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-06-13      1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-03      950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-03      3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-01-23      1300.0  NULL    10
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
Time taken: 0.059 seconds, Fetched: 11 row(s)
hive> 

3.4 动态分区插入

语法:

Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

用户可以不完全指定分区字段的值,而是部分指定。对于固定指定的分区称之为静态分区,相反,对于没有指定的分区称之为动态分区。
动态分区的值取决于select查询中相应的字段值

hive.exec.dynamic.partition.mode=[strict|nonstrict]
strict mode下,必须至少指定一个静态分区; nonstrictmode下,可以全部都是动态分区

四.CTAS

CTAS 省略了建表的语句,直接将查询结果集录入到新表里面,可以用于临时表及备份表。

语法:

CREATE TABLE tablename
[ROW FORMAT row_format] [STORED AS file_format]
AS SELECT … FROM

测试记录:

hive> 
    > 
    > 
    > desc emp_ctas;
FAILED: SemanticException [Error 10001]: Table not found emp_ctas
hive> create table emp_ctas as select * from emp where 1 = 1;
Query ID = root_20201204144641_1e70859c-276e-4ac4-b593-225f3d9e2770
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0027, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0027/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0027
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-04 14:46:48,605 Stage-1 map = 0%,  reduce = 0%
2020-12-04 14:46:54,786 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.69 sec
MapReduce Total cumulative CPU time: 3 seconds 690 msec
Ended Job = job_1606698967173_0027
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0028, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0028/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0028
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2020-12-04 14:47:07,574 Stage-3 map = 0%,  reduce = 0%
2020-12-04 14:47:13,760 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.52 sec
MapReduce Total cumulative CPU time: 1 seconds 520 msec
Ended Job = job_1606698967173_0028
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/emp_ctas
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 3.69 sec   HDFS Read: 9949 HDFS Write: 816 HDFS EC Read: 0 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 1.52 sec   HDFS Read: 3004 HDFS Write: 677 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 210 msec
OK
Time taken: 33.892 seconds
hive> 
    > select * from emp_ctas;
OK
7521    WARD    SALESMAN        7698    1981-02-22      1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-04-02      2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-09-28      1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-05-01      2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-06-09      2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-06-13      3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-09-08      1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-06-13      1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-03      950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-03      3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-01-23      1300.0  NULL    10
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-02-20      1600.0  300.0   30
Time taken: 0.061 seconds, Fetched: 14 row(s)

五.Writing data into the filesystem from queries

将查询的结果集录入到目录中

语法:


Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
 
  
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)

代码:

insert overwrite local directory '/tmp/emp.txt'
select * from emp;

测试记录:

hive> 
    > insert overwrite local directory '/tmp/emp.txt'
    > select * from emp;
Query ID = root_20201204145220_d4e24136-f156-466c-bf0d-f9e36e9a82fe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0029, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0029/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0029
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-04 14:52:26,262 Stage-1 map = 0%,  reduce = 0%
2020-12-04 14:52:33,455 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.42 sec
MapReduce Total cumulative CPU time: 3 seconds 420 msec
Ended Job = job_1606698967173_0029
Moving data to local directory /tmp/emp.txt
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 3.42 sec   HDFS Read: 9863 HDFS Write: 677 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 420 msec
OK
Time taken: 14.343 seconds


[root@hp1 hive]# more /tmp/emp.txt/

*** /tmp/emp.txt/: directory ***

[root@hp1 hive]# cd /tmp/emp.txt/
[root@hp1 emp.txt]# ls
000000_0  000001_0
[root@hp1 emp.txt]# more 000000_0
7521�WARD�SALESMAN�7698�1981-02-22�1250.0�500.0�30
7566�JONES�MANAGER�7839�1981-04-02�2975.0�\N�20
7654�MARTIN�SALESMAN�7698�1981-09-28�1250.0�1400.0�30
7698�BLAKE�MANAGER�7839�1981-05-01�2850.0�\N�30
7782�CLARK�MANAGER�7839�1981-06-09�2450.0�\N�10
7788�SCOTT�ANALYST�7566�1987-06-13�3000.0�\N�20
7839�KING�PRESIDENT�\N�1981-11-17�5000.0�\N�10
7844�TURNER�SALESMAN�7698�1981-09-08�1500.0�0.0�30
7876�ADAMS�CLERK�7788�1987-06-13�1100.0�\N�20
7900�JAMES�CLERK�7698�1981-12-03�950.0�\N�30
7902�FORD�ANALYST�7566�1981-12-03�3000.0�\N�20
7934�MILLER�CLERK�7782�1982-01-23�1300.0�\N�10
[root@hp1 emp.txt]# more 000001_0
7369�SMITH�CLERK�7902�1980-12-17�800.0�\N�20
7499�ALLEN�SALESMAN�7698�1981-02-20�1600.0�300.0�30
[root@hp1 emp.txt]# 

六.INSERT

sql标准的insert语句,使用方法同传统关系型数据库的insert 语句

语法:

Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
  
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal

需求:往 emp_ctas表录入数据

代码:

insert into emp_ctas(empno,ename,deptno) values (1,'test1',10);
insert into emp_ctas(empno,ename,deptno) values (2,'test2',20),(3,'test3',30);

测试记录:

hive> 
    > insert overwrite local directory '/tmp/emp.txt'
    > select * from emp;
Query ID = root_20201204145220_d4e24136-f156-466c-bf0d-f9e36e9a82fe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0029, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0029/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0029
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2020-12-04 14:52:26,262 Stage-1 map = 0%,  reduce = 0%
2020-12-04 14:52:33,455 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.42 sec
MapReduce Total cumulative CPU time: 3 seconds 420 msec
Ended Job = job_1606698967173_0029
Moving data to local directory /tmp/emp.txt
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 3.42 sec   HDFS Read: 9863 HDFS Write: 677 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 420 msec
OK
Time taken: 14.343 seconds
hive> 
    > insert into emp_ctas(empno,ename,deptno) values (1,'test1',10);
Query ID = root_20201204145950_46e74db1-30a2-47ef-bc68-7334c0bfe75d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0030, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0030/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0030
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-04 14:59:57,007 Stage-1 map = 0%,  reduce = 0%
2020-12-04 15:00:05,251 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.97 sec
MapReduce Total cumulative CPU time: 2 seconds 970 msec
Ended Job = job_1606698967173_0030
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/emp_ctas/.hive-staging_hive_2020-12-04_14-59-50_800_1003035375971432143-1/-ext-10000
Loading data to table test.emp_ctas
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.97 sec   HDFS Read: 5352 HDFS Write: 95 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 970 msec
OK
Time taken: 16.171 seconds
hive> insert into emp_ctas(empno,ename,deptno) values (2,'test2',20),(3,'test3',30);
Query ID = root_20201204150012_55a81ff8-bb7f-4e07-b77b-371655c88c3e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1606698967173_0031, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0031/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1606698967173_0031
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2020-12-04 15:00:18,975 Stage-1 map = 0%,  reduce = 0%
2020-12-04 15:00:25,227 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.0 sec
MapReduce Total cumulative CPU time: 3 seconds 0 msec
Ended Job = job_1606698967173_0031
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/emp_ctas/.hive-staging_hive_2020-12-04_15-00-12_024_6107099678250146899-1/-ext-10000
Loading data to table test.emp_ctas
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 3.0 sec   HDFS Read: 5366 HDFS Write: 121 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 0 msec
OK
Time taken: 14.84 seconds
hive> 

七.Update 语句

语法:

Standard Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]

默认是没有开启事务的,所以update语句执行会报错,如下:

hive> 
    > update emp_ctas set comm = 100 where ename = 'test1';
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive> 
    > 

后续在事务章节再来讲解ACID事务。

八.Delete 语句

语法:

Standard Syntax:
DELETE FROM tablename [WHERE expression]

后续在事务章节再来讲解ACID事务。

九.Merge

Merge的用法同关系型数据库的merge语句

语法:

Standard Syntax:
MERGE INTO <target table> AS T USING <source expression/table> AS S
ON <boolean expression1>
WHEN MATCHED [AND <boolean expression2>] THEN UPDATE SET <set clause list>
WHEN MATCHED [AND <boolean expression3>] THEN DELETE
WHEN NOT MATCHED [AND <boolean expression4>] THEN INSERT VALUES<value list>

hive 2.2版本之后才支持,此处略过。

参考

1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,132评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,802评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,566评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,858评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,867评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,695评论 1 282
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,064评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,705评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,915评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,677评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,796评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,432评论 4 322
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,041评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,992评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,223评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,185评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,535评论 2 343

推荐阅读更多精彩内容