数据说明
由两张表组成,其中orderinfo表记录了3月-5月的订单情况,字段包括orderid、userid、ispaid(支付状态)、price和paidtime(支付时间),共539411行。
userinfo记录了所有用户情况,字段包括userid、sex和birth,共101535行。
分析问题
1.统计不同月份的下单人数
2.统计用户三月份的回购率和复购率
3.统计男女用户的消费频次是否有差异
4.统计多次消费的用户,第一次和最后一次消费间隔是多少?
5.统计不同年龄段,用户的消费金额是否有差异?
6.统计消费的二八法则,消费的top20%用户,贡献了多少额度
导入处理
- 在数据库中建表
create table orderinfo(
orderid int primary key not null ,
userid int,
isPaid varchar(10),
price float,
paidTime varchar(30));
create table userinfo(
userid int primary key,
sex varchar(10),
birth date);
2.从cvs导入数据到mysql
load data local infile 'C:/Users/47207/Desktop/order_info_utf.csv' into table orderinfo fields terminated by ',';
load data local infile 'C:/Users/47207/Desktop/user_info_utf.csv' into table userinfo fields terminated by ',';
注:这里fields terminated by ','是指文件内容按‘逗号为分割填入sql表中
-
观察数据,将数据格式转化为统一的sql日期格式
1、先把时间格式标准化为 1993-02-27
update orderinfo set paidtime=replace(paidtime,'/','-') where paidtime is not null;
2、然后更新字符串为日期格式,然后才能使用日期函数进行操作,
update orderinfo set paidtime=str_to_date(paidtime,'%Y-%m-%d %H:%i') where paidtime is not null;
预处理完成如下:
问题解决
1.统计不同月份的下单人数
select DATE_FORMAT(paidtime,'%Y-%M') as month, count(distinct userid)
from orderinfo
where ispaid='已支付'
group by month;
2.统计用户三月份的回购率和复购率
这两个指标逻辑不同:
(1).回购率是三月购买的userid四月依旧购买。
select count(distinct userid)
from orderinfo
where month(paidtime)=4 and ispaid='已支付' and userid in (
select userid
from orderinfo
where ispaid='已支付' and month(paidtime)=3
group by userid);
上述语句得出了4月依旧购买的人数,除以3月购买人数即可方法二:
select count(b.april_user)/count(a.march_user)
from (select userid as march_user
from orderinfo
where isPaid='已支付' and month(paidTime)=3
group by userid) a left join (select userid as april_user
from orderinfo
where isPaid='已支付' and month(paidTime)=4
group by userid) b on a.march_user=b.april_user;
每个月的复购率可以如下计算:
select t1.m,count(t1.m),count(t2.m),count(t2.m)/count(t1.m) from
(select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t1
left join
(select userid,date_format(paidtime,"%Y-%m-01") m from orderinfo where isPaid = "已支付" group by userid,date_format(paidtime,"%Y-%m-01")) t2
on t1.userid = t2.userid and t2.m=date_add(t1.m,interval 1 month) group by t1.m;
(2).复购率是三月所有消费者中,购买2次及以上消费者的比例。
select count(purchase_num), SUM(IF(purchase_num>1,1,NULL)), SUM(IF(purchase_num>1,1,NULL))/count(purchase_num) as rate
FROM (select userid, count(*) as purchase_num
from orderinfo
where ispaid='已支付' and month(paidtime)=3
group by userid) a;
3.统计男女用户的消费频次是否有差异(平均消费次数)
select u.sex, count(o.userid)/count(distinct u.userid)
from orderinfo o, userinfo u
where o.userid=u.userid and o.isPaid='已支付' and u.sex <> ''
group by u.sex;
4.统计多次消费的用户,第一次和最后一次消费间隔是多少?
select userid, min(paidtime), max(paidtime), DATEDIFF(max(paidtime),min(paidtime))
from orderinfo
where ispaid='已支付'
group by userid
having count(*)>1
order by userid asc;
5.统计不同年龄段,用户的消费金额是否有差异?
思路:去除有问题日期数据后,将年龄分段(ceil函数向上取整)。再求出每个用户的年龄段、订单数、总金额。最后求出不同年龄段的订单平均金额。
select a.age_level, sum(a.sm)/sum(a.ct)
from
(select u.userid, u.age_level, count(u.userid) as ct, sum(price) as sm
from orderinfo o join
(select userid, CEIL((year(now())-year(birth))/10) as age_level
from userinfo
where birth > '1901-00-00') u on o.userid=u.userid
where o.isPaid='已支付'
group by u.userid, u.age_level) a
group by a.age_level
order by a.age_level asc;
6.统计消费的二八法则,消费的top20%用户,贡献了多少额度
思路:先求出总消费金额和用户数。
select count(*), sum(a.sum_p)
from(
select userid, sum(price) as sum_p
from orderinfo
where isPaid='已支付'
group by userid
order by sum(price) desc) a;
前百分之20%的用户为17130
select count(*), sum(a.sum_p)
from(
select userid, sum(price) as sum_p
from orderinfo
where isPaid='已支付'
group by userid
order by sum(price) desc
limit 17130) a;
贡献了额度的272201811.46/318501846.54=85%,基本符合二八定律。