571. 给定数字的频率查询中位数
Numbers 表保存数字的值及其频率。
+----------+-------------+
| Number | Frequency |
+----------+-------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+----------+-------------+
在此表中,数字为 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3,所以中位数是 (0 + 0) / 2 = 0。
+--------+
| median |
+--------|
| 0.0000 |
+--------+
请编写一个查询来查找所有数字的中位数并将结果命名为 median 。
569. 员工薪水中位数
Employee 表包含所有员工。Employee 表有三列:员工Id,公司名和薪水。
+-----+------------+--------+
|Id | Company | Salary |
+-----+------------+--------+
|1 | A | 2341 |
|2 | A | 341 |
|3 | A | 15 |
|4 | A | 15314 |
|5 | A | 451 |
|6 | A | 513 |
|7 | B | 15 |
|8 | B | 13 |
|9 | B | 1154 |
|10 | B | 1345 |
|11 | B | 1221 |
|12 | B | 234 |
|13 | C | 2345 |
|14 | C | 2645 |
|15 | C | 2645 |
|16 | C | 2652 |
|17 | C | 65 |
+-----+------------+--------+
请编写SQL查询来查找每个公司的薪水中位数。挑战点:你是否可以在不使用任何内置的SQL函数的情况下解决此问题。
+-----+------------+--------+
|Id | Company | Salary |
+-----+------------+--------+
|5 | A | 451 |
|6 | A | 513 |
|12 | B | 234 |
|9 | B | 1154 |
|14 | C | 2645 |
+-----+------------+--------+
SELECT
Id,
Company,
Salary
FROM
(
SELECT
Id,
Company,
Salary,
row_number() over(partition BY Company order by Salary) AS rk,
COUNT( *) over(partition BY Company) AS ct
FROM
Employee
)
a
WHERE
a.rk >= a.ct / 2
AND a.rk <= a.ct / 2 + 1
注意:mysql:5/2=3,spark:5/2=2.5,不影响结果。
1555. 银行账户概要
用户表: Users
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| user_id | int |
| user_name | varchar |
| credit | int |
+--------------+---------+
user_id 是这个表的主键。
表中的每一列包含每一个用户当前的额度信息。
交易表:Transactions
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| trans_id | int |
| paid_by | int |
| paid_to | int |
| amount | int |
| transacted_on | date |
+---------------+---------+
trans_id 是这个表的主键。
表中的每一列包含银行的交易信息。
ID 为 paid_by 的用户给 ID 为 paid_to 的用户转账。
力扣银行 (LCB) 帮助程序员们完成虚拟支付。我们的银行在表 Transaction 中记录每条交易信息,我们要查询每个用户的当前余额,并检查他们是否已透支(当前额度小于 0)。
写一条 SQL 语句,查询:
user_id 用户 ID
user_name 用户名
credit 完成交易后的余额
credit_limit_breached 检查是否透支 ("Yes" 或 "No")
以任意顺序返回结果表。
查询格式见如下示例:
Users 表:
+------------+--------------+-------------+
| user_id | user_name | credit |
+------------+--------------+-------------+
| 1 | Moustafa | 100 |
| 2 | Jonathan | 200 |
| 3 | Winston | 10000 |
| 4 | Luis | 800 |
+------------+--------------+-------------+
Transactions 表:
+------------+------------+------------+----------+---------------+
| trans_id | paid_by | paid_to | amount | transacted_on |
+------------+------------+------------+----------+---------------+
| 1 | 1 | 3 | 400 | 2020-08-01 |
| 2 | 3 | 2 | 500 | 2020-08-02 |
| 3 | 2 | 1 | 200 | 2020-08-03 |
+------------+------------+------------+----------+---------------+
结果表:
+------------+------------+------------+-----------------------+
| user_id | user_name | credit | credit_limit_breached |
+------------+------------+------------+-----------------------+
| 1 | Moustafa | -100 | Yes |
| 2 | Jonathan | 500 | No |
| 3 | Winston | 9900 | No |
| 4 | Luis | 800 | No |
+------------+------------+------------+-----------------------+
Moustafa 在 "2020-08-01" 支付了 200 ,当前额度 (100 -400 +200) = -500 并在 "2020-08-08" 支付了 500
Winston 在 "2020-08-01" 收到了 500 ,当前额度 (10000 +400 -500) = 800
SELECT DISTINCT
user_id AS USER_ID,
user_name AS USER_NAME,
credit + pay AS CREDIT,
(
CASE
WHEN (credit + pay) < 0
THEN 'Yes'
ELSE 'No'
END) AS CREDIT_LIMIT_BREACHED
FROM
(
SELECT
a.user_id,
a.user_name,
SUM(
CASE
WHEN a.user_id = b.paid_by
THEN - b.amount
WHEN a.user_id = b.paid_to
THEN b.amount
ELSE 0
END) over(partition BY a.user_id) AS pay,
a.credit
FROM
Users a
LEFT JOIN Transactions b
ON
a.user_id = b.paid_by
OR a.user_id = b.paid_to
) a
1412. 查找成绩处于中游的学生
表: Student
+---------------------+---------+
| Column Name | Type |
+---------------------+---------+
| student_id | int |
| student_name | varchar |
+---------------------+---------+
student_id 是该表主键.
student_name 学生名字.
表: Exam
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| exam_id | int |
| student_id | int |
| score | int |
+---------------+---------+
(exam_id, student_id) 是该表主键.
学生 student_id 在测验 exam_id 中得分为 score.
成绩处于中游的学生是指至少参加了一次测验, 且得分既不是最高分也不是最低分的学生。
写一个 SQL 语句,找出在所有测验中都处于中游的学生 (student_id, student_name)。
不要返回从来没有参加过测验的学生。返回结果表按照 student_id 排序。
查询结果格式如下。
Student 表:
+-------------+---------------+
| student_id | student_name |
+-------------+---------------+
| 1 | Daniel |
| 2 | Jade |
| 3 | Stella |
| 4 | Jonathan |
| 5 | Will |
+-------------+---------------+
Exam 表:
+------------+--------------+-----------+
| exam_id | student_id | score |
+------------+--------------+-----------+
| 10 | 1 | 70 |
| 10 | 2 | 80 |
| 10 | 3 | 90 |
| 20 | 1 | 80 |
| 30 | 1 | 70 |
| 30 | 3 | 80 |
| 30 | 4 | 90 |
| 40 | 1 | 60 |
| 40 | 2 | 70 |
| 40 | 4 | 80 |
+------------+--------------+-----------+
Result 表:
+-------------+---------------+
| student_id | student_name |
+-------------+---------------+
| 2 | Jade |
+-------------+---------------+
对于测验 1: 学生 1 和 3 分别获得了最低分和最高分。
对于测验 2: 学生 1 既获得了最高分, 也获得了最低分。
对于测验 3 和 4: 学生 1 和 4 分别获得了最低分和最高分。
学生 2 和 5 没有在任一场测验中获得了最高分或者最低分。
因为学生 5 从来没有参加过任何测验, 所以他被排除于结果表。
由此, 我们仅仅返回学生 2 的信息。
SELECT
b.student_id,
c.student_name
FROM
(
SELECT DISTINCT
student_id
FROM
Exam
WHERE
student_id NOT IN
(
SELECT
student_id
FROM
(
SELECT
exam_id,
student_id,
score,
MAX(score) over(partition BY exam_id) AS max_score,
MIN(score) over(partition BY exam_id) AS min_score
FROM
Exam
)
a
WHERE
a.score = a.max_score
OR a.score = a.min_score
)
)
b
JOIN Student c
ON
b.student_id = c.student_id
571. 给定数字的频率查询中位数
Numbers 表保存数字的值及其频率。
+----------+-------------+
| Number | Frequency |
+----------+-------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+----------+-------------+
在此表中,数字为 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3,所以中位数是 (0 + 0) / 2 = 0。
+--------+
| median |
+--------|
| 0.0000 |
+--------+
请编写一个查询来查找所有数字的中位数并将结果命名为 median 。
如果 n1.Number 为中位数,n1.Number(包含本身)前累计的数字应大于等于总数/2 同时n1.Number(不包含本身)前累计数字应小于等于总数/2
例如:0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3 共12个数
中位数0(包含本身)前累计的数字 7 >=6 0(不包含本身)前累计数字 0 <=6
例如:0,0,0,3,3,3 共6个数
中位数0(包含本身)前累计的数字 3 >=3 0(不包含本身)前累计数字 0 <=3
中位数3(包含本身)前累计的数字 6 >=3 3(不包含本身)前累计数字 3 <=3
SELECT
AVG(a.Number) AS median
FROM
(
SELECT
NUMBER,
Frequency,
SUM(Frequency) over(order by NUMBER) AS cur_sum,
ifnull(SUM(Frequency) over(order by NUMBER rows BETWEEN unbounded preceding AND 1 preceding), 0) AS pre_sum,
SUM(Frequency) over() AS total
FROM
Numbers
)
a
WHERE
a.cur_sum >= a.total / 2
AND a.pre_sum <= a.total / 2
1097. 游戏玩法分析 V
难度困难12
SQL架构
Activity
活动记录表
<pre style="box-sizing: border-box; font-size: 13px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; margin-top: 0px; margin-bottom: 1em; overflow: auto; background: rgba(var(--grey-9-rgb),0.04); padding: 10px 15px; color: rgba(var(--grey-9-rgb),1); line-height: 1.6; border-radius: 3px; white-space: pre-wrap;">+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id,event_date)是此表的主键
这张表显示了某些游戏的玩家的活动情况
每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)
</pre>
我们将玩家的安装日期定义为该玩家的第一个登录日。
我们还将某个日期 X
的第 1 天留存时间定义为安装日期为 X
的玩家的数量,他们在 X
之后的一天重新登录,除以安装日期为 X
的玩家的数量,四舍五入到小数点后两位。
编写一个 SQL 查询,报告每个安装日期、当天安装游戏的玩家数量和第一天的留存时间。
查询结果格式如下所示:
<pre style="box-sizing: border-box; font-size: 13px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; margin-top: 0px; margin-bottom: 1em; overflow: auto; background: rgba(var(--grey-9-rgb),0.04); padding: 10px 15px; color: rgba(var(--grey-9-rgb),1); line-height: 1.6; border-radius: 3px; white-space: pre-wrap;">Activity 表:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-01 | 0 |
| 3 | 4 | 2016-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result 表:
+------------+----------+----------------+
| install_dt | installs | Day1_retention |
+------------+----------+----------------+
| 2016-03-01 | 2 | 0.50 |
| 2017-06-25 | 1 | 0.00 |
+------------+----------+----------------+
玩家 1 和 3 在 2016-03-01 安装了游戏,但只有玩家 1 在 2016-03-02 重新登录,所以 2016-03-01 的第一天留存时间是 1/2=0.50
玩家 2 在 2017-06-25 安装了游戏,但在 2017-06-26 没有重新登录,因此 2017-06-25 的第一天留存时间为 0/1=0.00</pre>
select first_day as install_dt, count(distinct player_id) as installs,
round(sum(case when datediff(first_day,event_date)=-1 then 1 else 0 end)/count(distinct player_id),2) as Day1_retention from
(select player_id, event_date, min(event_date) over(partition by player_id) as first_day from Activity) a group by first_day
select first_day as install_dt, count(distinct player_id) as installs,
round(sum(case when datediff(first_day,event_date)=-1 then 1 else 0 end)/count(distinct player_id),2) as Day1_retention from
(select player_id, event_date, min(event_date) over(partition by player_id) as first_day from Activity) a group by first_day
datediff 前减去后
262. 行程和用户
难度困难176
SQL架构
Trips
表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id 和 Driver_Id 是 Users
表中 Users_Id 的外键。Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。
<pre style="box-sizing: border-box; font-size: 13px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; margin-top: 0px; margin-bottom: 1em; overflow: auto; background: rgba(var(--grey-9-rgb),0.04); padding: 10px 15px; color: rgba(var(--grey-9-rgb),1); line-height: 1.6; border-radius: 3px; white-space: pre-wrap;">+----+-----------+-----------+---------+--------------------+----------+
| Id | Client_Id | Driver_Id | City_Id | Status |Request_at|
+----+-----------+-----------+---------+--------------------+----------+
| 1 | 1 | 10 | 1 | completed |2013-10-01|
| 2 | 2 | 11 | 1 | cancelled_by_driver|2013-10-01|
| 3 | 3 | 12 | 6 | completed |2013-10-01|
| 4 | 4 | 13 | 6 | cancelled_by_client|2013-10-01|
| 5 | 1 | 10 | 1 | completed |2013-10-02|
| 6 | 2 | 11 | 6 | completed |2013-10-02|
| 7 | 3 | 12 | 6 | completed |2013-10-02|
| 8 | 2 | 12 | 12 | completed |2013-10-03|
| 9 | 3 | 10 | 12 | completed |2013-10-03|
| 10 | 4 | 13 | 12 | cancelled_by_driver|2013-10-03|
+----+-----------+-----------+---------+--------------------+----------+
</pre>
Users
表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。
<pre style="box-sizing: border-box; font-size: 13px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; margin-top: 0px; margin-bottom: 1em; overflow: auto; background: rgba(var(--grey-9-rgb),0.04); padding: 10px 15px; color: rgba(var(--grey-9-rgb),1); line-height: 1.6; border-radius: 3px; white-space: pre-wrap;">+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
</pre>
写一段 SQL 语句查出 **2013年10月1日 **至 **2013年10月3日 **期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。
取消率的计算方式如下:(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)
<pre style="box-sizing: border-box; font-size: 13px; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; margin-top: 0px; margin-bottom: 1em; overflow: auto; background: rgba(var(--grey-9-rgb),0.04); padding: 10px 15px; color: rgba(var(--grey-9-rgb),1); line-height: 1.6; border-radius: 3px; white-space: pre-wrap;">+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+</pre>
# Write your MySQL query statement below
select a.Request_at as Day,
round(sum(case when a.Status='cancelled_by_driver' or a.Status='cancelled_by_client' then 1 else 0 end)/count(*),2) as 'Cancellation Rate' from (select * from Trips where Request_at>='2013-10-01' and Request_at<='2013-10-03') a
join Users b on (a.Client_id=b.Users_Id and b.Banned='No')
join Users c on (a.Driver_id=c.Users_Id and c.Banned='No')
group by a.Request_at