周天把MySQL实现日留存率的方法写了一下(MySQL留存分析),但是昨晚上睡觉的时候一直睡不着,总感觉有什么事情没有做完,于是做梦的时候发现,原来还有周留存率和月留存率没有实现。早上起来想了一下,周留存率只需要把日期换成每周的周号不就行了吗?月也是同理。但是清醒的时候再仔细一想,不对劲啊,这种方式只能实现本年度的,如果跨年了就完全失效了。但现实生活中肯定是多年的数据。于是又开始想,没想到解决方案,又在地铁上一直查资料,各种关键词搜索,依然没有一个相关的解决方案。于是一上午就陷入了悲伤的情绪中。中午匆匆吃了个饭,在树荫下散步的时候,突然想起了我多年使用的日期表格,既然每年的周次是52周,不能累加了,我为什么不能自己累加呢,然后作为一个维度表进行关联就行了。顿时步伐就变得轻盈起来了。
于是晚上9:30早早就下班回家开始在日期维度表里面加了两列,周号和月号。如下,我是从2000年开始算成0周和0月的,以此相加,就形成了辅助列。做周留存和月留存就可以直接相减了。
有了思路,就在昨天日留存的基础上,多一步通过日期建立关联关系的步骤就行了,SQL语句如下↓
SELECT
lc1.id,
lc1.user_id,
date(lc1.time) date1,
date(lc2.time) date2,
d1.`周次`,
d1.weeknum d1wn,
d2.weeknum d2wn,
d2.weeknum-d1.weeknum wdiff
FROM
liucun as lc1
LEFT JOIN date as d1 ON date(lc1.time)=d1.日期
LEFT JOIN liucun as lc2 ON lc1.user_id = lc2.user_id
LEFT JOIN date as d2 ON date(lc2.time)=d2.日期
然后就继续按照日的思路,按照周次就行聚合,然后通过周次相减得到N周的留存数量,计算出N周留存率,SQL语句和结果如下↓
SELECT
周次,
COUNT(DISTINCT user_id) 当周用户数,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN wdiff=1 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 次周留存率,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN wdiff=2 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 两周留存率,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN wdiff=3 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 三周留存率
FROM
(SELECT
lc1.id,
lc1.user_id,
date(lc1.time) date1,
date(lc2.time) date2,
d1.`周次`,
d1.weeknum d1wn,
d2.weeknum d2wn,
d2.weeknum-d1.weeknum wdiff
FROM
liucun as lc1
LEFT JOIN date as d1 ON date(lc1.time)=d1.日期
LEFT JOIN liucun as lc2 ON lc1.user_id = lc2.user_id
LEFT JOIN date as d2 ON date(lc2.time)=d2.日期) temp
GROUP BY
周次
这里因为没有夸年,所有可以不需要用周次辅助表也可做到同样的效果,SQL语句如下,结果和上面是一样的↓
SELECT
DATE_FORMAT(date1,'%Y-w%v') week,
COUNT(DISTINCT user_id) 当周用户数,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN DATE_FORMAT(date2,'%v')-DATE_FORMAT(date1,'%v')=1 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 一周留存率,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN DATE_FORMAT(date2,'%v')-DATE_FORMAT(date1,'%v')=2 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 两周留存率,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN DATE_FORMAT(date2,'%v')-DATE_FORMAT(date1,'%v')=3 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 三周内留存率
FROM
(SELECT
lc1.id,
lc1.user_id,
date(lc1.time) date1,
date(lc2.time) date2,
DATEDIFF(date(lc1.time),date(lc2.time))
FROM
liucun as lc1
LEFT JOIN liucun as lc2 ON lc1.user_id = lc2.user_id) temp
GROUP BY
DATE_FORMAT(date1,'%Y-w%v')
按月的实现思路是一模一样的,就不多说了,SQL语句和结果如下↓
SELECT
月份,
COUNT(DISTINCT user_id) 当月用户数,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN mdiff=1 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 次月留存率,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN mdiff=2 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 两月留存率,
CONCAT(ROUND(COUNT(DISTINCT CASE WHEN mdiff=3 THEN user_id ELSE NULL END)/COUNT(DISTINCT user_id)*100,2),"%") 三月留存率
FROM
(SELECT
lc1.id,
lc1.user_id,
date(lc1.time) date1,
date(lc2.time) date2,
DATE_FORMAT(d1.日期,"%Y-%m") 月份,
d1.monthnum d1wn,
d2.monthnum d2wn,
d2.monthnum-d1.monthnum mdiff
FROM
liucun as lc1
LEFT JOIN date as d1 ON date(lc1.time)=d1.日期
LEFT JOIN liucun as lc2 ON lc1.user_id = lc2.user_id
LEFT JOIN date as d2 ON date(lc2.time)=d2.日期) temp
GROUP BY
月份
好了,这种思路可以继续发散,每10天留存率、每3天留存率、每旬留存率都是可以很方便实现的。但还是有个小遗憾会一直困扰着我,就是如果不借助辅助表这些将如何实现,希望能早日想明白。
End
◆ PowerBI开场白
◆ Python高德地图可视化
◆ Python不规则条形图