背景:
日常工作中有许多数据处理需求需要解决,在此之间,获得需求,用hive实现需求,最终完成任务
数据源:hive中的adventure_ods库的ods_sales_orders表
题目一:每个用户截止到每月为止的最大交易金额和累计到该月的总交易金额,结果数据格式如下:
思路:
1.对数据按照客户及其年-月分组
2.分组后就每月销售金额之和
3.使用窗口函数,对每个客户不同月份分组求最大值(max)和累计值(sum)
操作:
查看数据
select *
from ods_sales_orders
limit 5;
第一步:提取需要字段及按客户id、年月分组,求分组后的订单量及消费金额
select customer_key, substr(create_date, 1, 7) as umonth,
count(sales_order_key) as ucount,
sum(unit_price) as usum
from ods_sales_orders
group by customer_key, substr(create_date, 1, 7)
limit 10;
第二步:利用窗口函数,对客户按照月份排序,求最大金额及累积金额
select customer_key,umonth, ucount,
max(usum) over (partition by customer_key order by umonth) as current_max,
sum(usum) over (partition by customer_key order by umonth) as current_sum
from
(select customer_key, substr(create_date, 1, 7) as umonth,
count(sales_order_key) as ucount,
sum(unit_price) as usum
from ods_sales_orders
group by customer_key, substr(create_date, 1, 7)) as a
limit 10;