数据蛙练习题:
现在有一张表t,这张表存储了每个员工每天的打卡情况,现在需要统计截止目前每个员工的连续打卡天数,如下表所示:
表1.jpg
上表中uid是用户id,tdate是日期,is_flag是记录用户当天是否打卡,1为打卡,0为未打卡。
我们希望得到的结果为:
result.jpg
建表代码如下:
drop table if exists clock_in;
create table clock_in(uid int, tdate datetime, is_flag int);
insert into clock_in values(1, '2020/2/1', 1);
insert into clock_in values(1, '2020/2/2', 0);
insert into clock_in values(1, '2020/2/3', 1);
insert into clock_in values(1, '2020/2/4', 1);
insert into clock_in values(1, '2020/2/5', 0);
insert into clock_in values(1, '2020/2/6', 1);
insert into clock_in values(1, '2020/2/7', 1);
insert into clock_in values(1, '2020/2/8', 1);
insert into clock_in values(2, '2020/2/1', 1);
insert into clock_in values(2, '2020/2/2', 0);
insert into clock_in values(2, '2020/2/3', 0);
insert into clock_in values(2, '2020/2/4', 1);
insert into clock_in values(2, '2020/2/5', 1);
insert into clock_in values(2, '2020/2/6', 1);
insert into clock_in values(2, '2020/2/7', 1);
insert into clock_in values(2, '2020/2/8', 1);
参考网上的答案,以一个连续的数字和打卡日期相减,但是当打卡日期跨月时如何计算呢?
我的思路是:
1、先按照日期排序,用窗口函数生成连续的数字 rn
2、筛选is_flag 为1的部分,再使用窗口函数生成连续数字 rm
3、1表和2表内连接后,rn - rm
4、对3表中的rn-rm进行分组计数,得到连续打卡天数
5、对4表的数据选择最大的打卡天数即可
SELECT
t2.uid uid,
max( t2.cc ) AS flag_days
FROM
(
SELECT
t.uid,
count( t.rn - t.rm ) AS cc
FROM
(
SELECT
a.uid,
a.tdate,
a.is_flag,
b.rn,
row_number () over ( PARTITION BY a.uid ORDER BY a.tdate ) AS rm
FROM
clock_in a
INNER JOIN ( SELECT uid, tdate, is_flag, row_number () over ( PARTITION BY uid ORDER BY tdate ) AS rn FROM clock_in ) b ON a.uid = b.uid
AND a.tdate = b.tdate
AND a.is_flag = 1
) t
GROUP BY
t.uid,
t.rn - t.rm
) t2
GROUP BY
t2.uid;