Adventure Works Cycles是AdventureWorks样本数据库所虚构的公司,这是一家大型跨国制造公司。本次分析数据库使用datafrog提供的线上数据库,模拟真实业务场景。
公司背景
Adventure Works Cycle是国内一家制造公司,该公司生产和销售金属和复合材料自行车在全国各个市场。销售方式主要有两种,前期主要是分销商模式,但是2018年公司实现财政收入目标后,2019就开始通过公司自有网站获取线上商户进一步扩大市场。
项目背景
2019年12月5日,线上业务经理需要向公司CEO汇报2019年11月自行车销售情况,所以数据部门要提供11月份线上自行业务数据分析报告。
分析目的与思路
此报告目的在于汇报2019年11月综合销售情况,因此主要从如下5个角度展开分析。
1.整体销售表现:分析2019.1—2019.11自行车整体销售表现
2.地域的角度:分析11月每个区域销售量表现、11月TOP10城市销售量表现
3.产品类别的角度:分析11月各类别产品销售量表现、11月细分产品销售量表现
4.热销产品角度:分析11月TOP10产品销量榜、11月TOP10销量增速榜
5.用户行为分析:分析11月用户年龄分布及每个年龄段产品购买喜好、11月男女用户数量分布及男女消费偏好
此次分析分2个步骤:数据预处理(python)+可视化(powerbi),本文主要描述数据预处理的步骤,可视化报告后续进行更新
导入模块
#导入模块
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
from datetime import datetime #使用datetime库
pd.set_option('display.float_format', lambda x: '%.6f' % x)#不显示科学计数法,将显示方法变为数字
一、整体销售表现
1.1、从数据库读取源数据:dw_customer_order
#读取源数据。不同城市,每天产品销售信息
#创建数据库引擎
engine = create_engine('mysql://frogdata001:密码保密@106.13.128.83/adventure_ods?charset=gbk')
sql='select * from dw_customer_order'
gather_customer_order=pd.read_sql_query(sql,con = engine)
#查看源数据前5行,观察数据,判断数据是否正常识别
gather_customer_order.head()
gather_customer_order.info()#查看数据源类型
原始数据共有473050个字段,字段解释如下:
create_date 订单日期
product_name 产品名
cpzl_zw 产品子类
cplb_zw 产品类别
order_num 产品销售数量
customer_num 购买客户数
sum_amount 产品销售金额
is_current_year 是否当前年(1:是,0:否)
is_last_year 是否上一年(1:是,0:否)
is_yesterday 是否昨天(1:是,0:否)
is_today 是否今天(1:是,0:否)
is_current_month 是否当前余额(1:是,0:否)
is_current_quarter 是否当前季度(1:是,0:否)
chinese_province 所在省份
chinese_city 所在城市
chinese_territory 所在区域
增加create_year_month月份字段。按月维度分析时使用
gather_customer_order['create_year_month']=gather_customer_order.create_date.apply(lambda x:x.strftime('%Y-%m'))
筛选产品类别为自行车的数据
gather_customer_order = gather_customer_order[gather_customer_order['cplb_zw']=='自行车']
gather_customer_order
1.2 自行车整体销售量表现
每月订单数量和销售金额
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum','sum_amount':'sum'}).reset_index()
#每月订单数量和销售金额
overall_sales_performance.head()
#按日期降序排序,方便计算环比
求环比:新增一列order_num_diff,此为每月自行车销售订单量环比,本月与上月对比
overall_sales_performance=overall_sales_performance.sort_values('create_year_month',ascending=True)
#销量环比
overall_sales_performance['order_num_diff']=(overall_sales_performance.order_num.diff().fillna(0)/overall_sales_performance.order_num.shift()).fillna(0)
#销售金额环比
overall_sales_performance['sum_amount_diff']=(overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount.shift()).fillna(0)
overall_sales_performance
将最终的overall_sales_performance的DataFrame存入Mysql的当中,后续使用powerbi读取
字段注释:
create_year_month:时间,order_num:本月累计销售数量,sum_amount:本月累计销售金额,order_diff:本月销售数量环比,sum_amount_diff:本月销售金额环比,dw_customer_order:用户订单表
表名:pt_overall_sale_performance_1_sam
#将数据存入数据库
engine = create_engine('mysql://frogdata05:密码保密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
overall_sales_performance.to_sql('pt_overall_sale_performance_1_sam',con=engine,index=False,if_exists='append')
二、2019年11月自行车地域销售表现
2.1、数据清洗筛选10月11月数据
#gather_customer_order在分析自行车整体表现时已从数据库导入表(dw_customer_order),并筛选仅自行车数据,这里不再导入
gather_customer_order.head()
上面是一张表,太长了,分2段截图。
筛选10月11月自行车数据
#筛选10月11月自行车数据
gather_customer_order_10_11 = gather_customer_order[gather_customer_order['create_year_month'].isin(['2019-10','2019-11'])]
len(gather_customer_order_10_11)#10月11月自行车订单数据共6266条
2.2、2019年11月自行车区域销售量表现
由于需要求环比,因此也需要关注10月的销售情况。
10月、11月销售金额总和
#按照区域、月分组,订单量求和,销售金额求和
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month'])[['order_num','sum_amount']].sum().reset_index()
gather_customer_order_10_11_group.head()
10月11月的环比
#将区域存为列表
region_list=list(gather_customer_order_10_11.chinese_territory.unique())
region_list
#pct_change()当前元素与先前元素的相差百分比,求不同区域10月11月环比
order_x=pd.Series([])
amount_x=pd.Series([])
for i in region_list:
a=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i].order_num.pct_change()
b=gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory']==i].sum_amount.pct_change()
order_x=order_x.append(a).sort_index().fillna(0)
amount_x=amount_x.append(b).sort_index().fillna(0)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
#10月11月各个区域自行车销售数量、销售金额环比
gather_customer_order_10_11_group.head()
字段注释:
chinese_territory:区域,create_year_month:时间,order_num:区域销售数量,sum_amount:区域销售金额,order_diff:本月销售数量环比,amount_diff:本月销售金额环比
将数据存入数据库
表名:pt_bicy_november_territory_2_sam
#将数据存入数据库,表名:pt_bicy_november_territory_2_sam
engine = create_engine('mysql://frogdata05:密码保密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_sam',con=engine,index=False,if_exists='replace')
2.3、2019年11月自行车销售量TOP10城市环比
(1)筛选11月自行车交易数据
#筛选11月自行车交易数据
gather_customer_order_11 = gather_customer_order[gather_customer_order['create_year_month']=='2019-11']
(2)查看销量前十的城市销售数量
gather_customer_order_city_11 = gather_customer_order_11.groupby('chinese_city')['order_num'].sum().reset_index()
# 11月自行车销售数量前十城市
gather_customer_order_city_head = gather_customer_order_city_11.sort_values(by=['order_num'],ascending=False).head(10)
#查看11月自行车销售数量前十城市
gather_customer_order_city_head
(3)分组计算前十城市,自行车销售数量销售金额
#11月销售前十城市,全部自行车销售数据
a=pd.merge(gather_customer_order_city_head['chinese_city'],gather_customer_order,on='chinese_city',how='left')
#筛选销售前十城市,10月11月自行车销售数据
gather_customer_order_10_11_head =a[a['create_year_month'].isin(['2019-10','2019-11'])]
#分组计算前十城市,自行车销售数量销售金额
gather_customer_order_city_10_11 = gather_customer_order.groupby(['chinese_city','create_year_month']).sum().reset_index()[(gather_customer_order.groupby(['chinese_city','create_year_month']).sum().reset_index().chinese_city.isin(gather_customer_order_city_head.chinese_city)) & (gather_customer_order.groupby(['chinese_city','create_year_month']).sum().reset_index().create_year_month.isin(['2019-10','2019-11']))].drop(labels=['customer_num'],axis=1).reset_index(drop=True)
gather_customer_order_city_10_11
计算10月11月top10城市销售数量和销售金额环比
#计算前十城市环比
city_top_list = list(gather_customer_order_city_10_11.drop_duplicates(['chinese_city']).chinese_city)
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
#print(i)
a=gather_customer_order_city_10_11[gather_customer_order_city_10_11.chinese_city==i].order_num.pct_change()
b=gather_customer_order_city_10_11[gather_customer_order_city_10_11.chinese_city==i].sum_amount.pct_change()
order_top_x=order_top_x.append(a).fillna(0)
amount_top_x = amount_top_x.append(b).fillna(0)
#order_diff销售数量环比,amount_diff销售金额环比
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff']=amount_top_x
gather_customer_order_city_10_11
字段注释
chinese_city:城市,create_year_month:时间,order_num:本月销售数量,sum_amount:本月销售金额,order_diff:本月销售数量环比,amount_diff:本月销售金额环比
表名:pt_bicy_november_october_city_3_sam
#存入数据库
engine = create_engine('mysql://frogdata05:保密@106.15.121.232/datafrog05_adventure?charset=gbk')
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_sam',con=engine,index=False,if_exists='replace')
三、2019年11月自行车各类别产品销售表现
3.1、细分市场销量表现
#求每个月自行车累计销售数量
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
#合并自行车销售信息表+自行车每月累计销售数量表,pd.merge
order_num_proportion = pd.merge(gather_customer_order,gather_customer_order_group_month,how='left',on='create_year_month')
#计算自行车销量/自行车每月销量占比
order_num_proportion['order_proportion'] = order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
#重命名sum_month_order:自行车每月销售量
order_num_proportion = order_num_proportion.rename(columns={'order_num_y':'sum_month_order'})
将每月自行车销售信息存入数据库
表名:pt_bicycle_product_sales_month_4_sam
字段注释
create_date:时间, product_name:产品名, cpzl_zw:产品类别, cplb_zw:产品大类, order_num_x:产品当天销售数量,customer_num:当天用户购买人数,
sum_amount:产品当天销售金额, chinese_province:省份, chinese_city:城市, chinese_territory:区域,create_year_month:月份, sum_month_order:本月累计销量, order_proportion:产品销量占比
#将每月自行车销售信息存入数据库
engine = create_engine('mysql://frogdata05:保密@106.15.121.232/datafrog05_adventure?charset=gbk')
order_num_proportion.to_sql('pt_bicycle_product_sales_month_4_sam',con=engine,index=False,if_exists='replace')
3.2公路/山地/旅游自行车细分市场表现
(1)公路自行车细分市场销量表现
求每月公路自行车中各个型号自行车销售数量与公路自行车总销量,为后续求各个型号的比率做铺垫
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
#1.求公路自行车不同型号产品销售数量
gather_customer_order_road_month =gather_customer_order_road.groupby(by = ['create_year_month','product_name']).sum().order_num.reset_index()
gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
#2.每个月公路自行车累计销售数量
gather_customer_order_road_month_sum = gather_customer_order_road_month.groupby('create_year_month').agg({'order_num':'sum'}).reset_index()
#3.合并公路自行车gather_customer_order_road_month与每月累计销售数量
#用于计算不同型号产品的占比
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on='create_year_month',how='outer')
gather_customer_order_road_month.head()
(2)山地自行车细分市场销量表现
与公路自行车处理方式一致
# 筛选
gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行车']
#求山地自行车不同型号产品销售数量
gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(['create_year_month','product_name']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
#每个月公路自行车累计销售数量
gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby('create_year_month').sum().reset_index()
gather_customer_order_road_month_sum.head()
#合并山地自行车hz_customer_order_Mountain_month与每月累计销售数量
#用于计算不同型号产品的占比
gather_customer_order_Mountain_month = pd.merge(gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum,on='create_year_month')
gather_customer_order_Mountain_month.head()
(3)旅游自行车细分市场销量表现
与公路自行车处理方式一致
gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
#求旅游自行车不同型号产品销售数量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(['create_year_month','product_name']).agg({'order_num':'sum'}).reset_index()
gather_customer_order_tour_month['cpzl_zw'] = '旅游自行车'
gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby('create_year_month').sum().reset_index()
gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,on='create_year_month')
(4)将山地自行车、旅游自行车、公路自行车每月销量信息合并
#将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_Mountain_month,gather_customer_order_tour_month],axis=0)
(5)总销售量比率
求出每种一级子类自行车(公路、山地、旅行)中,各品类的各月的销量占比
#各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportio'] =gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
gather_customer_order_month.head()
#改名
#order_month_product当月产品累计销量
#sum_order_month当月自行车总销量
gather_customer_order_month.rename(columns={'order_num_x':'order_month_product','order_num_y':'sum_order_month'},inplace=True)
将细分市场表现数据存入数据库
表名:pt_bicycle_product_sales_order_month_4_sam
字段注释:
create_year_month:时间,product_name:产品名,order_month_product:本月产品累计销量,sum_order_month:当月自行车总销量,order_num_proportio:本月产品销量占比
#将数据存入数据库
engine = create_engine('mysql://frogdata05:secret@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_sam',con=engine,index=False,if_exists='replace')
3.3、公路/山地/旅游自行车2019年11月销量环比
#计算11月环比,先筛选10月11月数据
gather_customer_order_month_10_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
#排序。将10月11月自行车销售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month']).reset_index(drop=True)
product_name = list(gather_customer_order_month_10_11.product_name.drop_duplicates())
#计算自行车销售数量环比
order_num_diff=pd.Series([])
for i in product_name:
a=gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i].order_month_product.pct_change()
order_num_diff=order_num_diff.append(a).fillna(0)
gather_customer_order_month_10_11['order_num_diff'] = order_num_diff
#筛选出11月自行车数据
gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month'] == '2019-11']
gather_customer_order_month_11.head(10)
3.4、2019年1月至11月产品累计销量
month_1_11=list(gather_customer_order_month.create_year_month.drop_duplicates()[:11])
gather_customer_order_month_1_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(month_1_11)]
#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').order_month_product.sum().reset_index()
#重命名sum_order_1_11:1-11月产品累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})
3.5 2019年11月自行车产品销量、环比、累计销量(结合3.2-3.4)
累计销量在gather_customer_order_month_1_11_sum中已计算好,11月自行车环比、及产品销量占比在gather_customer_order_month_11已计算好,把2张表merge一下
#按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on='product_name')
#观察表数据
gather_customer_order_month_11.head()
存入数据库
表名:pt_bicycle_product_sales_order_month_11_sam
字段注释:
create_year_month:时间;product_name:产品,order_month_product:产品本月累计销量,cpzl_zw:产品类别,sum_order_month:当月自行车总销量,order_num_proportio:产品本月占比,order_num_diff:产品本月环比,sum_order_1_11:1-11月产品累计销量
#存入数据库
engine = create_engine('mysql://frogdata05:Frogdata!1321@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_sam',index=False,con=engine,if_exists='replace')
四、2019年11月热品销售分析
4.1、11月产品销量TOP10产品,销量及环比
2019年11月自行车产品销售表现”时已计算出11月所有产品的销量及环比,这里不在重复计算,直接使用gather_customer_order_month_10_11、gather_customer_order_month_11
取top10产品
#计算产品销售数量,\ 为换行符
#按照销量降序,取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').\
order_num.count().reset_index().sort_values(by = 'order_num',ascending = False).head(10)
计算TOP10销量及环比
只需要五个字段:create_year_month月份,product_name产品名,order_month_product本月销量,cpzl_zw产品类别,order_num_diff本月产品销量环比
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
#找出top10商品
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].isin(list(customer_order_11_top10['product_name']))]
customer_order_month_10_11['category'] = '本月TOP10销量'
customer_order_month_10_11.head()
4.2、11月产品增速TOP10产品,销售数量及环比
#11月增长率前十的产品信息
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].sort_values(by = 'order_num_diff',ascending = False).head(10)
#11月top10增长率产品在10月和11月的信息
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].isin(list(customer_order_month_11['product_name']))]
筛选需要的四个字段:create_year_month月份,product_name产品名,order_month_product本月销量,cpzl_zw产品类别,order_num_diff本月产品销量环比
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
customer_order_month_11_top10_seep.head()
合并TOP10销量表和TOP10增速表
#axis = 0按照行维度合并,axis = 1按照列维度合并
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
hot_products_11
将11月Top10销量产品、增速产品表存入数据库
表名:pt_hot_products_november_sam
字段注释:
create_year_month:月份,product_name:产品名,order_month_product:本月产品销量,order_num_diff:本月产品环比,category:分类
#存入数据库
engine = create_engine('mysql://frogdata05:秘密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
datafrog=engine
hot_products_11.to_sql('pt_hot_products_november_sam',con = datafrog,if_exists='replace', index=False)
五、用户行为分析
这里需要使用订单明细表ods_sales_orders,用户表信息表ods_customer
#读取数据库客户信息表
engine = create_engine('mysql://frogdata001:秘密@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
#读取数据库销售订单表
engine = create_engine('mysql://frogdata001:秘密@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select * from ods_sales_orders where create_date>='2019-11-1' and create_date<'2019-12-1'",con = datafrog)
销售订单表中仅客户编号,无客户年龄性别等信息,需要将销售订单表和客户信息表合并
#pd.merge
sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how='left')
sales_customer_order_11.head()
获取客人的年份作为新的一列 birth_year
sales_customer_order_11.birth_date.fillna('0000-00-00',inplace=True)#有空值
sales_customer_order_11['birth_date'].str.split('-').apply(lambda x:x[0])
sales_customer_order_11['birth_year']=sales_customer_order_11['birth_date'].str.split('-').apply(lambda x:x[0])
sales_customer_order_11.head(3)
5.1、用户年龄分析
#修改出生年为int数据类型
sales_customer_order_11['birth_year'] = sales_customer_order_11['birth_year'].astype(int)
# 计算用户年龄
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
对年龄进行分层0-34,35-39,40-44,45-49,50-54,55-59,60-64
#年龄分层1
bins=[0,34,39,44,49,54,59,64]
labels=['0-34','35-39','40-44','45-49','50-54','55-59','60-64']
#新增'age_level'分层区间列
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],bins=bins,labels=labels)
筛选销售订单为自行车的订单信息
#筛选销售订单为自行车的订单信息
df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
#查看总数:一共3438个
df_customer_order_bycle.count()
# 计算年龄比率
df_customer_order_bycle['age_level_rate'] = 1/3438
df_customer_order_bycle
将年龄分为3个层次,分别为'<=29'、'30-39'、'>=40'
def cut_age(x):
'''年龄分层函数'''
if x>=40:
return '>=40'
elif x>=30 & x<=39:
return '30-39'
elif x<=29:
return '<=29'
else:
return 'nan'
'''将年龄分为3个层次'''
df_customer_order_bycle.age_level2=df_customer_order_bycle.customer_age
df_customer_order_bycle['age_level2']=df_customer_order_bycle.customer_age.apply(cut_age)
df_customer_order_bycle.head(3)
查看每个年龄段人数
# 求每个年龄段人数
age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').customer_key.count().reset_index()
age_level2_count.head()
将自行车销售订单表和每个年龄段人数表合并,为后续算年龄段比例做准备
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'customer_key_y':'age_level2_count'})
5.2、用户性别分析
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
gender_count
将性别个数和自行车销售订单表+每个年龄段人数表合并,形成一个汇总信息表,并计算性别比率。汇总成了用户行为信息总表。
#合并2表
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
#性别比率
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
将11月用户行为信息总表保存到数据库
表名:pt_user_behavior_november_sam
#df_customer_order_bycle 将11月自行车用户存入数据库
#存入数据库
engine = create_engine('mysql://frogdata05:秘密@106.15.121.232:3306/datafrog05_adventure?charset=gbk')
datafrog=engine
df_customer_order_bycle.to_sql('pt_user_behavior_november_sam',con = datafrog,if_exists='replace', index=False)
至此数据预处理的工作已经完成,后续会更新使用powerBI进行可视化的部分
以下为分析报告