MySQL求连续登陆天数-鹅厂面试题

一.需求

求每个用户的最长连续登录天数,两个日期的间隔小于或等于 3 均视为连续登录。比如 01-01 号登录,最近的下一次登录是 01-04 号,两个日期的间隔等于 3 天,因此这两个日期之间的天数都算作活跃天数,一共 4 天。

因为考虑MySQL版本问题,暂时不能使用分析函数。

二.解决方案

测试数据:

create table test_login(user_id int,login_date timestamp);

insert into test_login values (1,'2020-01-01 00:01:00');
insert into test_login values (1,'2020-01-01 00:02:00');
insert into test_login values (1,'2020-01-01 00:03:00');
insert into test_login values (1,'2020-01-02 00:03:00');
insert into test_login values (1,'2020-01-05 00:03:00');
insert into test_login values (1,'2020-01-07 00:03:00');
insert into test_login values (1,'2020-01-11 00:03:00');
insert into test_login values (1,'2020-01-12 00:03:00');
insert into test_login values (1,'2020-01-13 00:03:00');
insert into test_login values (1,'2020-01-14 00:03:00');
insert into test_login values (1,'2020-01-17 00:03:00');
insert into test_login values (1,'2020-01-18 00:03:00');
insert into test_login values (1,'2020-01-19 00:03:00');
insert into test_login values (1,'2020-01-22 00:03:00');

insert into test_login values (2,'2020-01-01 00:04:00');
insert into test_login values (2,'2020-01-01 00:05:00');
insert into test_login values (2,'2020-01-01 00:06:00');
insert into test_login values (2,'2020-01-02 00:06:00');
insert into test_login values (2,'2020-01-07 00:06:00');
insert into test_login values (2,'2020-01-11 00:06:00');
insert into test_login values (2,'2020-01-12 00:06:00');
insert into test_login values (2,'2020-01-14 00:06:00');
insert into test_login values (2,'2020-01-18 00:06:00');
insert into test_login values (2,'2020-01-19 00:06:00');
insert into test_login values (2,'2020-01-22 00:06:00');

代码

SELECT user_id,
       max(diff_days) + 1 max_diff_days
  from 
(
SELECT user_id,
       flag2,
       min(login_date) min_login_date,
       max(login_date) max_login_date,
       datediff(max(login_date),min(login_date)) diff_days
  from 
(
SELECT tmp3.user_id,
       tmp3.login_date,
       tmp3.flag1,
       sum(tmp4.flag1) flag2
  from 
(
SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期
) tmp3  -- 打标记 如上下间隔<=3 则为1  否则为0
left join 
(
SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期
) tmp4  -- 功能同tmp3 
on  tmp3.user_id = tmp4.user_id
and tmp4.login_date <= tmp3.login_date
group by tmp3.user_id,tmp3.login_date,tmp3.flag1
) tmp5 -- 每一个用户  每一个连续区间设置为不同的flag  标记为flag2
group by user_id,flag2
) tmp6 -- 计算出每个连续区间 最大值和最小值  以及连续间隔天数
group by user_id

这样看起来是不是觉得非常的复杂,下面我们拆解开

2.1 同一天多次登陆去重

考虑同一天会有多次登陆的,此处用group by语句进行去重,同一天只保留一条记录

代码:

SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')

测试记录:

mysql> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d');
+---------+------------+
| user_id | login_date |
+---------+------------+
|       1 | 2020-01-01 |
|       1 | 2020-01-02 |
|       1 | 2020-01-05 |
|       1 | 2020-01-07 |
|       1 | 2020-01-11 |
|       1 | 2020-01-12 |
|       1 | 2020-01-13 |
|       1 | 2020-01-14 |
|       1 | 2020-01-17 |
|       1 | 2020-01-18 |
|       1 | 2020-01-19 |
|       1 | 2020-01-22 |
|       2 | 2020-01-01 |
|       2 | 2020-01-02 |
|       2 | 2020-01-07 |
|       2 | 2020-01-11 |
|       2 | 2020-01-12 |
|       2 | 2020-01-14 |
|       2 | 2020-01-18 |
|       2 | 2020-01-19 |
|       2 | 2020-01-22 |
+---------+------------+
21 rows in set (0.00 sec)

mysql> 

2.2 求上一次的登陆日期

我们需要找到上一次登陆日期,与此次的登陆日期对比,才可以判断是否符合小于等于3天的条件
此处,我们可以使用标量子查询来实现
需要记录的每个用户的第一条登陆信息上一天为null

代码:

SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条

测试记录:

mysql> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ;
+---------+------------+---------------+
| user_id | login_date | up_login_date |
+---------+------------+---------------+
|       1 | 2020-01-01 | NULL          |
|       1 | 2020-01-02 | 2020-01-01    |
|       1 | 2020-01-05 | 2020-01-02    |
|       1 | 2020-01-07 | 2020-01-05    |
|       1 | 2020-01-11 | 2020-01-07    |
|       1 | 2020-01-12 | 2020-01-11    |
|       1 | 2020-01-13 | 2020-01-12    |
|       1 | 2020-01-14 | 2020-01-13    |
|       1 | 2020-01-17 | 2020-01-14    |
|       1 | 2020-01-18 | 2020-01-17    |
|       1 | 2020-01-19 | 2020-01-18    |
|       1 | 2020-01-22 | 2020-01-19    |
|       2 | 2020-01-01 | NULL          |
|       2 | 2020-01-02 | 2020-01-01    |
|       2 | 2020-01-07 | 2020-01-02    |
|       2 | 2020-01-11 | 2020-01-07    |
|       2 | 2020-01-12 | 2020-01-11    |
|       2 | 2020-01-14 | 2020-01-12    |
|       2 | 2020-01-18 | 2020-01-14    |
|       2 | 2020-01-19 | 2020-01-18    |
|       2 | 2020-01-22 | 2020-01-19    |
+---------+------------+---------------+
21 rows in set (0.00 sec)

2.3 打标记判断是否间隔天数<=3

求出上一个登陆日期,与本次登陆日期比较,如果<=3,则标记为0,否则标记为1
需要注意的是上一步的空值直接标记为1
这样通过flag1 我们就可以看到连续登陆的区间了

代码:

SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期

测试记录:

mysql> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ;
+---------+------------+-------+
| user_id | login_date | flag1 |
+---------+------------+-------+
|       1 | 2020-01-01 |     1 |
|       1 | 2020-01-02 |     0 |
|       1 | 2020-01-05 |     0 |
|       1 | 2020-01-07 |     0 |
|       1 | 2020-01-11 |     1 |
|       1 | 2020-01-12 |     0 |
|       1 | 2020-01-13 |     0 |
|       1 | 2020-01-14 |     0 |
|       1 | 2020-01-17 |     0 |
|       1 | 2020-01-18 |     0 |
|       1 | 2020-01-19 |     0 |
|       1 | 2020-01-22 |     0 |
|       2 | 2020-01-01 |     1 |
|       2 | 2020-01-02 |     0 |
|       2 | 2020-01-07 |     1 |
|       2 | 2020-01-11 |     1 |
|       2 | 2020-01-12 |     0 |
|       2 | 2020-01-14 |     0 |
|       2 | 2020-01-18 |     1 |
|       2 | 2020-01-19 |     0 |
|       2 | 2020-01-22 |     0 |
+---------+------------+-------+
21 rows in set (0.01 sec)

2.4 打分组的标记

因为同一个用户存在多个连续登陆的区间,根据上一个步骤的flag1没办法区分开,此时需要区分开同一个用户的不同连续登陆区间。

因为MySQL版本不支持分析函数,只能通过临时表表连接的方式实现,再考虑去除重复,需要用到分组语句进行去重。

其实flag2是这个解决方案核心所在,同一个用户第一次连续登陆区间标记为1,第二次则累加为2,以此类推。

代码:

SELECT tmp3.user_id,
       tmp3.login_date,
       tmp3.flag1,
       sum(tmp4.flag1) flag2
  from 
(
SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期
) tmp3  -- 打标记 如上下间隔<=3 则为1  否则为0
left join 
(
SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期
) tmp4  -- 功能同tmp3 
on  tmp3.user_id = tmp4.user_id
and tmp4.login_date <= tmp3.login_date
group by tmp3.user_id,tmp3.login_date,tmp3.flag1

测试记录:

mysql> SELECT tmp3.user_id,
    ->        tmp3.login_date,
    ->        tmp3.flag1,
    ->        sum(tmp4.flag1) flag2
    ->   from 
    -> (
    -> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ) tmp3  -- 打标记 如上下间隔<=3 则为1  否则为0
    -> left join 
    -> (
    -> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ) tmp4  -- 功能同tmp3 
    -> on  tmp3.user_id = tmp4.user_id
    -> and tmp4.login_date <= tmp3.login_date
    -> group by tmp3.user_id,tmp3.login_date,tmp3.flag1;
+---------+------------+-------+-------+
| user_id | login_date | flag1 | flag2 |
+---------+------------+-------+-------+
|       1 | 2020-01-01 |     1 |     1 |
|       1 | 2020-01-02 |     0 |     1 |
|       1 | 2020-01-05 |     0 |     1 |
|       1 | 2020-01-07 |     0 |     1 |
|       1 | 2020-01-11 |     1 |     2 |
|       1 | 2020-01-12 |     0 |     2 |
|       1 | 2020-01-13 |     0 |     2 |
|       1 | 2020-01-14 |     0 |     2 |
|       1 | 2020-01-17 |     0 |     2 |
|       1 | 2020-01-18 |     0 |     2 |
|       1 | 2020-01-19 |     0 |     2 |
|       1 | 2020-01-22 |     0 |     2 |
|       2 | 2020-01-01 |     1 |     1 |
|       2 | 2020-01-02 |     0 |     1 |
|       2 | 2020-01-07 |     1 |     2 |
|       2 | 2020-01-11 |     1 |     3 |
|       2 | 2020-01-12 |     0 |     3 |
|       2 | 2020-01-14 |     0 |     3 |
|       2 | 2020-01-18 |     1 |     4 |
|       2 | 2020-01-19 |     0 |     4 |
|       2 | 2020-01-22 |     0 |     4 |
+---------+------------+-------+-------+
21 rows in set (0.01 sec)

2.5 分组求最大的间隔天数

其实有了上一个步骤的flag2,就可以判断区分开每一个用户的每一个连续登陆区间,直接进行分组就可以求出该连续区间 最大和最小的登陆日期,两个日期差值就是间隔天数。最后根据user_id进行分组,求最大的间隔天数就是最终结果。

需要审题 01-01到01-04 间隔是3天,但是算4天连续登陆 所以datediff函数的结果要+1

代码:

SELECT user_id,
       max(diff_days) +1 max_diff_days
  from 
(
SELECT user_id,
       flag2,
       min(login_date) min_login_date,
       max(login_date) max_login_date,
       datediff(max(login_date),min(login_date)) diff_days
  from 
(
SELECT tmp3.user_id,
       tmp3.login_date,
       tmp3.flag1,
       sum(tmp4.flag1) flag2
  from 
(
SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期
) tmp3  -- 打标记 如上下间隔<=3 则为1  否则为0
left join 
(
SELECT user_id,
       login_date,
       case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
  from 
(
SELECT  user_id,
        login_date,
        (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
from 
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
) tmp2  -- 通过标量子查询实现上一天的登陆日期
) tmp4  -- 功能同tmp3 
on  tmp3.user_id = tmp4.user_id
and tmp4.login_date <= tmp3.login_date
group by tmp3.user_id,tmp3.login_date,tmp3.flag1
) tmp5 -- 每一个用户  每一个连续区间设置为不同的flag  标记为flag2
group by user_id,flag2
) tmp6 -- 计算出每个连续区间 最大值和最小值  以及连续间隔天数
group by user_id

测试记录:

mysql> SELECT user_id,
    ->        flag2,
    ->        min(login_date) min_login_date,
    ->        max(login_date) max_login_date,
    ->        datediff(max(login_date),min(login_date)) diff_days
    ->   from 
    -> (
    -> SELECT tmp3.user_id,
    ->        tmp3.login_date,
    ->        tmp3.flag1,
    ->        sum(tmp4.flag1) flag2
    ->   from 
    -> (
    -> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ) tmp3  -- 打标记 如上下间隔<=3 则为1  否则为0
    -> left join 
    -> (
    -> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ) tmp4  -- 功能同tmp3 
    -> on  tmp3.user_id = tmp4.user_id
    -> and tmp4.login_date <= tmp3.login_date
    -> group by tmp3.user_id,tmp3.login_date,tmp3.flag1
    -> ) tmp5 -- 每一个用户  每一个连续区间设置为不同的flag  标记为flag2
    -> group by user_id,flag2;
+---------+-------+----------------+----------------+-----------+
| user_id | flag2 | min_login_date | max_login_date | diff_days |
+---------+-------+----------------+----------------+-----------+
|       1 |     1 | 2020-01-01     | 2020-01-07     |         6 |
|       1 |     2 | 2020-01-11     | 2020-01-22     |        11 |
|       2 |     1 | 2020-01-01     | 2020-01-02     |         1 |
|       2 |     2 | 2020-01-07     | 2020-01-07     |         0 |
|       2 |     3 | 2020-01-11     | 2020-01-14     |         3 |
|       2 |     4 | 2020-01-18     | 2020-01-22     |         4 |
+---------+-------+----------------+----------------+-----------+
6 rows in set (0.01 sec)

mysql>


mysql> SELECT user_id,
    ->        max(diff_days) +1 max_diff_days
    ->   from 
    -> (
    -> SELECT user_id,
    ->        flag2,
    ->        min(login_date) min_login_date,
    ->        max(login_date) max_login_date,
    ->        datediff(max(login_date),min(login_date)) diff_days
    ->   from 
    -> (
    -> SELECT tmp3.user_id,
    ->        tmp3.login_date,
    ->        tmp3.flag1,
    ->        sum(tmp4.flag1) flag2
    ->   from 
    -> (
    -> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ) tmp3  -- 打标记 如上下间隔<=3 则为1  否则为0
    -> left join 
    -> (
    -> SELECT user_id,
    ->        login_date,
    ->        case when up_login_date is null then 1 else case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1 end end as flag1
    ->   from 
    -> (
    -> SELECT  user_id,
    ->         login_date,
    ->         (select max(date_format(login_date,'%Y-%m-%d')) from test_login t2 where t2.user_id = tmp1.user_id and t2.login_date < tmp1.login_date ) as up_login_date
    -> from 
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login 
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ) tmp1  -- tmp1表 分组去重 一天多次登陆的算一条
    -> ) tmp2  -- 通过标量子查询实现上一天的登陆日期
    -> ) tmp4  -- 功能同tmp3 
    -> on  tmp3.user_id = tmp4.user_id
    -> and tmp4.login_date <= tmp3.login_date
    -> group by tmp3.user_id,tmp3.login_date,tmp3.flag1
    -> ) tmp5 -- 每一个用户  每一个连续区间设置为不同的flag  标记为flag2
    -> group by user_id,flag2
    -> ) tmp6 -- 计算出每个连续区间 最大值和最小值  以及连续间隔天数
    -> group by user_id
    -> ;
+---------+---------------+
| user_id | max_diff_days |
+---------+---------------+
|       1 |            12 |
|       2 |             5 |
+---------+---------------+
2 rows in set (0.00 sec)

mysql> 
 

三.MySQL 8.0写法

MySQL 8.0的with语句以及分析窗口函数,可以使上面的解决方案的代码简洁度大大提升

代码:

with tmp1 AS
-- tmp1临时表 一天多次登陆算一次
(
SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
  from test_login 
 group by user_id,date_format(login_date,'%Y-%m-%d')
),
tmp2 as
-- tmp2临时表  通过lag获取上一次登陆时间  如是第一条给默认值
(
select user_id,
       login_date,
       lag(login_date,1,'1900-01-01') over(partition by user_id order by login_date) up_login_date
  from tmp1
),
tmp3 AS
-- 判断是否符合3天内标准  打标记flag1
(
select  user_id,
        login_date,
        case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1  end as flag1
  from  tmp2
),
tmp4 AS
-- 通过分析函数将每个用户 每一个连续登陆期间进行标记 flag2
(
select  user_id,
        login_date,
        flag1,
        sum(flag1) over(partition by user_id order by login_date) as flag2
  from tmp3
),
tmp5 AS
-- 通过user_id flag2进行分组
(
SELECT user_id,
       flag2,
       min(login_date) min_login_date,
       max(login_date) max_login_date,
       datediff(max(login_date),min(login_date)) diff_days
  from tmp4
 group by user_id,flag2
)
SELECT user_id,
       max(diff_days) + 1 as diff_days
  from tmp5
 group by user_id;

测试记录:

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.19    |
+-----------+
1 row in set (0.00 sec)

mysql> with tmp1 AS
    -> -- tmp1临时表 一天多次登陆算一次
    -> (
    -> SELECT user_id,date_format(login_date,'%Y-%m-%d') login_date
    ->   from test_login
    ->  group by user_id,date_format(login_date,'%Y-%m-%d')
    -> ),
    -> tmp2 as
    -> -- tmp2临时表  通过lag获取上一次登陆时间  如是第一条给默认值
    -> (
    -> select user_id,
    ->        login_date,
    ->        lag(login_date,1,'1900-01-01') over(partition by user_id order by login_date) up_login_d
    ->   from tmp1
    -> ),
    -> tmp3 AS
    -> -- 判断是否符合3天内标准  打标记flag1
    -> (
    -> select  user_id,
    ->         login_date,
    ->         case when DATEDIFF(login_date,up_login_date) <= 3 then 0 else 1  end as flag1
    ->   from  tmp2
    -> ),
    -> tmp4 AS
    -> -- 通过分析函数将每个用户 每一个连续登陆期间进行标记 flag2
    -> (
    -> select  user_id,
    ->         login_date,
    ->         flag1,
    ->         sum(flag1) over(partition by user_id order by login_date) as flag2
    ->   from tmp3
    -> ),
    -> tmp5 AS
    -> -- 通过user_id flag2进行分组
    -> (
    -> SELECT user_id,
    ->        flag2,
    ->        min(login_date) min_login_date,
    ->        max(login_date) max_login_date,
    ->        datediff(max(login_date),min(login_date)) diff_days
    ->   from tmp4
    ->  group by user_id,flag2
    -> )
    -> SELECT user_id,
    ->        max(diff_days) + 1 as diff_days
    ->   from tmp5
    ->  group by user_id;
+---------+-----------+
| user_id | diff_days |
+---------+-----------+
|       1 |        12 |
|       2 |         5 |
+---------+-----------+
2 rows in set (0.00 sec)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,922评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,591评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,546评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,467评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,553评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,580评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,588评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,334评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,780评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,092评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,270评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,925评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,573评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,194评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,437评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,154评论 2 366
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,127评论 2 352

推荐阅读更多精彩内容