Hive lateral view explode
介绍
当我们想对hive表中某一列进行split之后,想对其转换成1 to N的模式,即一行转多列。
hive不允许我们在UDTF函数之外,再添加其它select语句。
如下,我们想将登录某个游戏的用户id放在一个字段user_ids里,对每一行数据用UDTF后输出多行。
lateral view explain
Lateral view is used in conjunction with user-defined table generatingfunctions such as explode()
. As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.
Lateral view 其实就是用来和像类似explode这种UDTF函数联用的。lateral view 会将UDTF生成的结果放到一个虚拟表中,然后这个虚拟表会和输入行即每个game_id进行join 来达到连接UDTF外的select字段的目的。
举个例子:
创建一个文件。里面两列用\t分割。
hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 2.451 seconds
hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli;
Copying data from file:/home/hadoop/test_lateral
Copying file: file:/home/hadoop/test_lateral
Loading data to table dw.test_lateral_view_shengli
OK
Time taken: 6.716 seconds
hive> select * from test_lateral_view_shengli;
OK
game101 15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110
game66 winning1ren[[[13810537508
game101 hu330602003[[[hu330602004[[[hu330602005[[[15967506560
下面使用lateral view
hive> select game_id, user_id
> from test_lateral_view_shengli lateral view explode(split(userl_ids,'\\[\\[\\[')) snTable as user_id
> ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839
Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839
2014-05-16 17:39:19,108 Stage-1 map = 0%, reduce = 0%
2014-05-16 17:39:28,157 Stage-1 map = 100%, reduce = 0%
2014-05-16 17:39:38,830 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201403301416_445839
OK
game101 hu330602003
game101 hu330602004
game101 hu330602005
game101 15967506560
game101 15358083654
game101 ab33787873
game101 zjy18052480603
game101 shlg1881826
game101 lxqab110
game66 winning1ren
game66 13810537508
多个lateral view
From语句后可以跟多个Lateral View。
给定数据
Array<int> col1 | Array<string> col2 |
---|---|
[1, 2] | [a", "b", "c"] |
[3, 4] | [d", "e", "f"] |
语句
SELECT myCol1, myCol2 FROM baseTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;
还有一种情况,如果UDTF转换的Array是空的怎么办呢?
在Hive0.12里面会支持outer关键字,如果UDTF的结果是空,默认会被忽略输出。
如果加上outer关键字,则会像left outer join 一样,还是会输出select出的列,而UDTF的输出结果是NULL。
hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;
什么也不输出
如果加上outer关键字
SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
结果
238 val_238 NULL
86 val_86 NULL
311 val_311 NULL
27 val_27 NULL
165 val_165 NULL
409 val_409 NULL
255 val_255 NULL
278 val_278 NULL
98 val_98 NULL