https://zhuanlan.zhihu.com/p/136915306
问题
1.订单的应答率、完单率分别是多少?
2.呼叫应答时间多长?
3.从这一周的数据来看,呼叫量最高的是哪一个小时(当地时间)?呼叫量最少的是哪一个小时(当地时间)?
4.呼叫订单第二天继续呼叫的比例有多少?
5.如果要对表中乘客进行分类,你认为需要参考哪一些因素?
答案:
1.订单的应答率、完单率分别是多少?
应答率 = 应答订单/呼叫订单
所有步骤的预处理:
1)时间列处理成datetime格式
update [exercise].[dbo].[20210128didi] set
call_time = cast(call_time as datetime),
grab_time = cast(grab_time as datetime),
cancel_time = cast(cancel_time as datetime),
finish_time = cast(finish_time as datetime)
2)将中国时间处理成巴西时间
--SQL SERVERL
update [exercise].[dbo].[20210128didi] set
call_time = DATEADD(hour, -11, call_time),
grab_time = DATEADD(hour, -11, grab_time),
cancel_time = DATEADD(hour, -11, cancel_time),
finish_time = DATEADD(hour, -11, finish_time)
--MySQL
update [exercise].[dbo].[20210128didi] set
call_time = date_sub(call_time, INTERVAL 11 hour),
grab_time = date_sub(grab_time, INTERVAL 11 hour),
finish_time = date_sub(finish_time, INTERVAL 11 hour)
DATEADD(SQL SERVER):https://www.jianshu.com/p/aa8f4ba3768e (日期函数汇总)
DATE_SUB(My SQL)
作用:DATE_SUB() 函数从日期减去指定的时间间隔。
语法:DATE_SUB(date,INTERVAL expr type)
date 参数是合法的日期表达式。
expr 参数是您希望减去的时间间隔。
type 时间间隔类型(包括second、minute、hour、day、week、month等等)
计算应答率 = 应答订单/呼叫订单
--SQL SERVER 想着详细一点,所以分子分母分开写了
--计算应答率
SELECT SUM(IIF(YEAR(grab_time)<>'1970',1,0)) as N'grab_orders',
count(call_time) as N'call_orders',
CAST(CAST(SUM(IIF(YEAR(grab_time)<>'1970',1,0))*1.0*100/count(call_time) AS decimal(10,2)) AS VARCHAR) +'%' as N'grab_rate'
FROM [exercise].[dbo].[20210128didi]
--计算完单率
Select SUM(IIF(YEAR(finish_time)<>1970,1,0)) as N'finish_orders',
Count(call_time) as N'call_orders',
CAST(CAST(SUM(IIF(YEAR(finish_time)<>1970,1,0))*100/Count(call_time) AS decimal(10,2)) AS varchar) + '%' AS N'finish_rate'
FROM [exercise].[dbo].[20210128didi]
--My SQL
select sum(if(year(grab_time)<>1970,1,0))/count(call_time)as '应答率',
sum(if(year(finish_time)<>1970,1,0))/count(call_time)as '完单率'
from didi
2.呼叫应答时间多长?
呼叫应答时间=被应答订单从呼叫到被应答时长总和/被应答订单数量
--SQL SERVER 使用DATEIFF
SELECT SUM(CAST(DATEDIFF(MINUTE,call_time,grab_time) AS decimal(10,2))) as N'GrabTotalTime',
SUM(IIF(YEAR(grab_time)<>'1970',1,0)) as N'grabOrders',
Cast(SUM(CAST(DATEDIFF(MINUTE,call_time,grab_time) AS decimal(10,2)))/SUM(IIF(YEAR(grab_time)<>'1970',1,0)) AS decimal(10,2)) AS N'GrabDuaration'
FROM [exercise].[dbo].[20210128didi]
WHERE YEAR(grab_time)<>1970
-- MY SQL 使用timestampdiff函数
select sum(TIMESTAMPDIFF(MINUTE,call_time,grab_time))/count(grab_time)as '呼叫应答时间'
from didi
where year(grab_time)<>1970;
3. 从这一周的数据来看,呼叫量最高的是哪一个小时(当地时间)?呼叫量最少的是哪一个小时(当地时间)?
我先查了一下 给的测试数据源确实是一周的量
前面我们已经处理好了当地时间,接下来直接做就可以了
--SQL SERVER 有个缺点 万一有重复值,再研究下开窗函数在这怎么用
--最高
SELECT TOP 1 *
FROM (
SELECT DISTINCT DATEPART(hour, call_time) AS _hour, COUNT(call_time) AS _times
FROM [exercise].[dbo].[20210128didi]
GROUP BY DATEPART(hour, call_time)
) a
ORDER BY _times DESC
--最低
SELECT TOP 1 *
FROM (
SELECT DISTINCT DATEPART(hour, call_time) AS _hour, COUNT(call_time) AS _times
FROM [exercise].[dbo].[20210128didi]
GROUP BY DATEPART(hour, call_time)
) a
ORDER BY _times
--MY SQL
-- 新增一列
alter table didi add column call_time_hour VARCHAR(255);
-- 使用substr函数做字符串截取,为新列赋值
update didi set call_time_hour = SUBSTR(call_time from 12 for 2);
-- 方法2:使用date_format函数转换格式
update didi set call_time_hour = DATE_FORMAT(call_time,'%k')
-- 找出呼叫量最高的小时,显示2行防止出现重复值。
select call_time_hour,count(call_time)as'呼叫量'
from didi
GROUP BY call_time_hour
ORDER BY count(call_time) desc
limit 2;
-- 找出呼叫量最少的小时,显示3行确认是否有第三个相等值。
select call_time_hour,count(call_time)as'呼叫量'
from didi
GROUP BY call_time_hour
ORDER BY count(call_time) asc
limit 3;
4.呼叫订单第二天继续呼叫的比例有多少?
--SQL SERVER
SELECT CAST(CAST(COUNT(DISTINCT a.order_id) * 100 / (
SELECT COUNT(DISTINCT order_id)
FROM [exercise].[dbo].[20210128didi]
) AS decimal(10, 2)) AS varchar) + '%' AS '第二天继续呼叫比例'
FROM [exercise].[dbo].[20210128didi] a
JOIN [exercise].[dbo].[20210128didi] b ON a.passenger_id = b.passenger_id
WHERE datediff(day, a.call_time, b.call_time) = 1
--MY SQL
select count(DISTINCT a.order_id)/(select count(DISTINCT order_id) from didi)as'第二天继续呼叫比例'
from didi a join didi b
on a.passenger_id = b.passenger_id
where datediff(a.call_time,b.call_time)=1;
5.如果要对表中乘客进行分类,你认为需要参考哪一些因素?
基于已有的表格中数据,可以参RFM模型对用户进行分类,
R:乘客上一次打车距离3月11日的时间间隔
F:乘客在数据期间的打车频率
M:打车消费金额(表中无打车金额,可以用完成订单总时长代替)