SQL

1. posexplode

select date, sum(amount) as total_amount 
from ( 
  select date_add(start_date,id) as date, amount from tmp 
  lateral view posexplode(split(space(datediff(end_date,start_date)),' ')) t as   id,t_date 
) t 
group by date ;

reference: https://www.jianshu.com/p/d333e70c21d1

2. space

select split(space(10), ' ') from dual;

3. lag/lead

lag(col, n, default) 前n行数据

LAG(column_expression, offset, default_value) OVER (PARTITION BY partition_expression ORDER BY sort_expression)

lead(col, n, default) 后n行数据

LEAD(column_expression, offset, default_value) OVER (PARTITION BY partition_expression ORDER BY sort_expression)

https://zhuanlan.zhihu.com/p/397789258?utm_id=0

4. ntile

ntile(n) over(order by col) as bucket_num

n是指定的分箱数量。如果不能平均分配,则优先分配较小编号的箱,并且各个箱中能放的行数最多相差1。
备注:NULL值的处理,可以设置单独为一组,或者默认为最小值

5. first_value/last_value

first_value取分区内排序后,截止到当前行,第一个值
last_value分组内排序后,截止到当前行,最后一个值,如果要取分组内的最后一行,需要使用 rows between current row and unbounded following

select *,
first_value(logday) over(partition by userid order by logday) as co1,
last_value(logday) over(partition by userid order by logday) as co2,
last_value(logday) over(partition by userid order by logday rows between current row and unbounded following) as co3
from test.test_window;

6. rank/dense_rank

rank 会留下空位
dense_rank 不会留下空位

7. percent_rank()

当前行-1 / 当前组总行数-1

select  dept, userid, sal,
percent_rank() over(order by sal) as pr1
from cookie3;

结果分析:
d1  user1   1000    0.0     (1-1)/(5-1)=0.0
d1  user2   2000    0.25    (2-1)/(5-1)=0.25
d1  user3   3000    0.5     (3-1)/(5-1)=0.5
d2  user4   4000    0.75    (4-1)/(5-1)=0.75
d2  user5   5000    1.0     (5-1)/(5-1)=1.0

select  dept, userid, sal,
percent_rank() over(partition by dept order by sal) as pr2
from cookie3;

结果分析:
d1  user1   1000    0.0     (1-1)/(3-1)=0.0
d1  user2   2000    0.5     (2-1)/(3-1)=0.5
d1  user3   3000    1.0     (3-1)/(3-1)=1.0
d2  user4   4000    0.0     (1-1)/(2-1)=0.0
d2  user5   5000    1.0     (2-1)/(2-1)=1.0

8. cume_dist

小于(或大于)等于当前值的行数/分组内总行数

select dept, userid, sal,
cume_dist() over(order by sal) as cume1
from cookie3;

结果:
d1  user1   1000    0.2
d1  user2   2000    0.4
d1  user3   3000    0.6
d2  user4   4000    0.8
d2  user5   5000    1.0

9. hive 窗口函数中的rows和range的区分

sum(id) over(order by id range between 1 preceding and 2 following)
窗口大小设置为该分区内小于本记录id-1和id+2天的窗口
order by id,所以要看id的值,并对key(id进行range操作),即[id-1, id+2],注意是闭区间

https://blog.csdn.net/DolphinF/article/details/131695958
窗口函数使用:https://www.baispace.cn/article/hive-knowledge-window-function.html

10. 时间日期函数

https://www.cnblogs.com/lubians/p/17480959.html

11. array

https://blog.csdn.net/dupizi/article/details/131779370

12. limit offset

跳过offset行,取接下来的limit行。


image.png

参考:https://www.yiibai.com/sql/sql-limit.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容