一.HIVE的基础知识
- Hive复合数据类型:map
– 创建map:map、str_to_map
– 取key、value:map_keys、map_values
– 使用:map与lateral view - Hive的窗口和分析函数入门
– row_number、rank、dense_rank等对一窗口内给定列进行:取行号、排名 - 企业应用信息安全
– Hive、Impala转换函数translate进行简单数据保护 - HiveServer2 JDBC接口实例应用、中文支持Bug纠错
一. 数据准备之数据库设计
原始三个表后面的表是衍生出来的
三, 数据表详细设计
-
f_orders 的数据内容
11 2014-05-01 06:01:12.334+01 10703007267488 item8:2|item1:1 22 2014-05-01 07:28:12.342+01 10101043505096 item6:3|item3:2 33 2014-05-01 07:50:12.33+01 10103043509747 item7:7 11 2014-05-01 09:27:12.33+01 10103043501575 item5:5|item1:1|item4:1|item9:1 22 2014-05-01 09:03:12.324+01 10104043514061 item1:3 33 2014-05-02 19:10:12.343+01 11003002067594 item4:2|item1:1 11 2014-05-02 09:07:12.344+01 10101043497459 item9:1 35 2014-05-03 11:07:12.339+01 10203019269975 item5:1|item1:1 789 2014-05-03 12:59:12.743+01 10401003346256 item7:3|item8:2|item9:1 77 2014-05-03 18:04:12.355+01 10203019262235 item5:2|item1:1 99 2014-05-04 00:36:39.713+01 10103044681799 item9:3|item1:1 33 2014-05-04 19:10:12.343+01 12345678901234 item5:1|item1:1 11 2014-05-05 09:07:12.344+01 12345678901235 item6:1|item1:1 35 2014-05-05 11:07:12.339+01 12345678901236 item5:2|item1:1 22 2014-05-05 12:59:12.743+01 12345678901237 item9:3|item1:1 77 2014-05-05 18:04:12.355+01 12345678901238 item8:3|item1:1 99 2014-05-05 20:36:39.713+01 12345678901239 item9:3|item1:1
-
创建f_orders的数据表
CREATE EXTERNAL TABLE f_orders ( user_id STRING , ts STRING , order_id STRING , items map<STRING,BIGINT> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':'
-
将数组拆开查询
select user_id, order_id, item, amount from f_orders LATERAL VIEW explode(items) t AS item, amount
-
d_items的数据内容设计
item1 100.2 catalogA|catalogD|catalogX item2 200.3 catalogA item3 300.4 catalogA|catalogX item4 400.5 catalogB item5 500.6 catalogB|catalogX item6 600.7 catalogB item7 700.8 catalogC item8 800.9 catalogC|catalogD item9 899.99 catalogC|catalogA
-
d_item的数据表创建
CREATE EXTERNAL TABLE d_items ( item_sku STRING, price DOUBLE, catalogs array<STRING> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '|'
-
d_item的查询语句
CREATE TABLE usr_cat AS select user_id, catalog, row_number() OVER (PARTITION BY user_id ORDER BY weight DESC) as row_num FROM usr_cat_weight order by user_id,row_num; FROM ( select orders.user_id, catalogs.catalog, sum(orders.amount) as weight from ( select user_id, item, amount from f_orders LATERAL VIEW explode(items) t AS item, amount ) orders join ( select item_sku, catalog from d_items LATERAL VIEW explode(catalogs) t AS catalog ) catalogs on (orders.item = catalogs.item_sku) group by orders.user_id, catalogs.catalog order by user_id, weight ) x ORDER BY user_id, row_num;
-
d_user 表的数据
11;m;1981-01-01;user11@gmail.com;2014-04-21 22;w;1982-01-01;user22@abcn.net;2014-04-22 33;m;1983-01-01;user33@fxlive.de;2014-04-23 77;w;1977-01-01;user77@fxlive.fr;2014-05-01 88;m;1988-01-01;user88@fxlive.eu;2014-05-02 99;w;1999-01-01;user99@abcn.net;2014-05-03
-
d_user的创建语句
CREATE EXTERNAL TABLE d_users ( user_id STRING , gender STRING , birthday STRING , email STRING , regday STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
-
对隐秘字段信息进行加密
select user_id, birthday, translate(birthday, '0123456789', '1234567890'), email, translate(email, 'userfxgmail1234567890', '1234567890userfxgmail') from d_users; CREATE TABLE user_segment AS select c.user_id, u.gender, u.age, c.catalogs from ( select user_id, group_concat(catalog, '|') as catalogs from usr_cat where row_num < 3 group by user_id ) c left outer join ( select user_id, gender, year(now()) - cast(substr(birthday, 1, 4) as int) as age from d_users ) u on (c.user_id = u.user_id) ;