Questions about Consecutive Numbers in MySQL

When writing queries based on business needs, we are very likely to create some queries, in which a certain number/ID appears multiple times consecutively, or ones, in which a number/date is increasing in an ascending order. We need to write an query to filter out these records.

For example, our boss requires us to find out active users that log in our game, let's say Hornor of Kings. In this setting, a particular user, let's call him user 1, may login in our game for consecutive 5 days. Then the data will look like this.

User_id date
1 2021-07-01
1 2021-07-02
1 2021-07-03
1 2021-07-04
1 2021-07-05

Assume that (user_id,date) is the primary key. We can call such kind of user active user.

The solution to this type of question can be summarized as follows:

  • Create a column, which we can call tag, to determine whether a number/id appears consecutively
    • To add such column, we can define variables and use window function
  • Create temporary table wrapping the query result. For example with temp as (query)
  • Filter out records that question asks us to find out based on the temp table.

Any method is unjustified and unclear if there is no question demonstrating it.

Question 1: Consecutive Numbers
Let's look at one problem from Leetcode: Please find numbers that appear at least three times consecutively.
Logs

Id Num
1 1
2 1
3 1
4 2
5 1
6 2
7 2

Id is the primary key of this table. Write an SQL query to find all numbers that appear at least three times consecutively. Return the result table in any order.

Solution1: Variables
Let me explain how to solve this question using variables. Remember that our objective is to create one column that serves the purpose of automatically calculating the frequency.

  • First define variable @num:=0,@cnt where := is assignment and @ is a variable keyword immediately followed by variable_name. So @num:=0 means that we assign 0 to the varaible @num
    • @num is used to record which number we are checking. It needs to be assigned to a value of new record each time the SQL select a new record. For example, when SQL moves from the first row to the second row, @id should be assigned to 1 in the second row.
    • @cnt is used to record the frequency that a number appears. For example, for the first three rows, we can find that Num has records 1, 1, and 1. So @cnt should be 1,2,3.
SELECT
  *
FROM Logs,(select @num:=0,@cnt:=0) t

In this sql, select @num:=0,@cnt:=0 is the syntax of defining variables. We have created two variables, num and cnt. Then we need to figure out the logic: @cnt can increase by 1 only when @num equals a record of Num.

SELECT
  *,
  case when @num=num then @cnt:=@cnt+1
           else @cnt:=1
  end as tag
FROM Logs,(select @num:=0,@cnt:=0) t
Id Num tag
1 1 1
2 1 2
3 1 3
4 2 1
5 1 1
6 2 1
7 2 2

Then we create a temporary table and find out the number that appears three times consecutively through tag column.

select
  distinct num ConsecutiveNums
from (SELECT
  *,
  case when @num=num then @cnt:=@cnt+1
           else @cnt:=1
  end as tag
FROM Logs,(select @num:=0,@cnt:=0) t) t1
where t1.tag>=3

Note: we must use distinct to drop any duplicated records. The reason for duplication is that a particular number may appear three times consecutively for id 1, 2, 3, and this can be the same case for id 7,8,9.

Solution2: Window Function
lag(exp,n) is used to return the value from the row that precedes the current row
lead(exp,n) is used to return the value from the row that succeeds the current row
So the general idea of using window function is that we find out the value from the row that precedes the current row and the value from the row that succeeds the current row. If the value of the three columns are equal, suggesting that the value appear at least three times consecutively.

  • First step: create two columns for the value of previous row and succeeding row
select
  num,
  lag(num,1) over (order by id asc)  as preceding_num,
  lead(num,1) over (order by id asc) as succeeding_num
from logs
  • Second Step: create a temporary table that stores the query result
with temp as (select num,
  lag(num,1) over (order by id asc)  as preceding_num,
  lead(num,1) over (order by id asc) as succeeding_num
from logs)
  • Third Step: filter out values from the temp table by num=preceding_num and num=succeeding_num
with temp as (select num,
  lag(num,1) over (order by id asc)  as preceding_num,
  lead(num,1) over (order by id asc) as succeeding_num
from logs)

select
  distinct num as ConsecutiveNums
from temp
where num = preceding_num
and num = succeeding_num;

Question 2: Active Users
Accounts

id name
1 Winston
7 Jonathan

Logins

id login_date
7 2020-05-30
1 2020-05-30
7 2020-05-31
7 2020-06-01
7 2020-06-02
7 2020-06-02
7 2020-06-03
1 2020-06-07
7 2020-06-10

No primary key in this table. (This suggests that there might be duplicated records. So we need to be cautious)

First, let's find out the difference between the question and the first question.
When we try to compute the frequency of a number appearing in the first question, we only need to examine whether the value of current row is the same as that from the preceding row and succeeding row. However, the second question requires us not only to examine this situation but also check whether the date is consecutive.

Then, let me give the solution using window function.
If we try to find out numbers appearing in consecutive dates, there is one fixed thought on the type of question: calculate the row_number for each record, and then subtract the date, which can be either date or number, from the row_number. If a number appear for consecutive days, meaning that a number appear multiple times at a given number/date increarsing by fixted step, the substraction will be equal for this number.
For example, we want to find out users log in for consecutive three days

id login_date
7 2020-05-31
7 2020-06-01
7 2020-06-02

We can find that the login_date, for id 7, is consecutive. In this case, the substraction will be 2020-05-30 for the three records, if I add one column row_number for id using window function.

id login_date row_number Diff
7 2020-05-31 1 2020-05-30
7 2020-06-01 2 2020-05-30
7 2020-06-02 3 2020-05-30

Then we can group the data by id,Diff and filter out any id with frequency(count(*)) greater than or equal 3.

So the procedures of solving this question can be summarize as follows:

  • Create row_number to compute the difference between row_number and login_date
select
  num,
  row_number() over (partition by id order by login_date asc)
from logins
  • Substrat the login_date from row_number to create one tag that help determine which user consecutively log in. Here we need to use date_sub(exp, interval x date_type) function when working with datetime data.
select
        *,
        date_sub(login_date,Interval row_number() over (partition by id order by login_date asc) DAY ) tag 
from (select distinct id,login_date from logins)
  • Group the table by id,diff to filter out active users
select
    distinct t1.id,a.name
from (select
        *,
        date_sub(login_date,Interval row_number() over (partition by id order by login_date asc) DAY ) tag 
    from (select distinct id,login_date from logins)t ) t1
join accounts a
on a.id = t1.id
group by t1.id,t1.tag
having count(t1.tag)>=5;

Note: we must use distinct keyword to drop duplicated records because of no primary key in login table.

There is one a little bit difficult question from Leetcode, You can use this question for practice this kind of question.

601. Q3: Human Traffic of Stadium

Write an SQL query to display the records with three or more rows with consecutive id's, and the number of people is greater than or equal to 100 for each.

Return the result table ordered by visit_date in ascending order. The query result format is in the following example.
Stadium

id visit_date people
1 2017-01-01 10
2 2017-01-02 109
3 2017-01-03 150
4 2017-01-04 99
5 2017-01-05 145
6 2017-01-06 1455
7 2017-01-07 199
8 2017-01-09 188

visit_date is the primary key for this table.
Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit.
No two rows will have the same visit_date, and as the id increases, the dates increase as well.

Here's the solution.(You can refer)

with temp as (select
                *,
                id - row_number() over (order by visit_date asc) as flag
            from stadium
            where people>=100
),
temp1 as (select
            flag
        from temp
        group by flag
        having count(*)>=3)
select
    id,
    visit_date,
    people
from temp
where flag in (select flag from temp1)
order by visit_date asc;
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • ORACLE自学教程 --create tabletestone ( id number, --序号usernam...
    落叶寂聊阅读 4,818评论 0 0
  • 7题属于delete题型,很少用到,先跳过。 8.Rising Temperature id is the pri...
    喜欢捉迷藏的小可爱阅读 1,937评论 0 0
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 13,145评论 0 13
  • 我是黑夜里大雨纷飞的人啊 1 “又到一年六月,有人笑有人哭,有人欢乐有人忧愁,有人惊喜有人失落,有的觉得收获满满有...
    陌忘宇阅读 12,721评论 28 53
  • 人工智能是什么?什么是人工智能?人工智能是未来发展的必然趋势吗?以后人工智能技术真的能达到电影里机器人的智能水平吗...
    ZLLZ阅读 9,382评论 0 5