Adventure项目

一、项目概况

1. 成果预览

分析概述:本文是对Adventure项目的总结,记录项目分析与实现过程,主要任务是对MySQL数据库中原始数据通过pandasnumpy聚合、分析,并使用聚合后数据进行dashboard的搭建,为业务、运营人员提供自主分析工具。在聚合后数据的基础上,汇报2019年11月自行车销售情况,为精细化运营提供数据支持,精准定位目标客户群体。

可视化看板预览:

首页

PPT预览:

PPT

2. 背景介绍

2.1 公司背景

Adventure Works Cycles是基于微软Adventure Works示例数据库的虚拟公司,是一家大型跨国制造公司。该公司生产金属和复合材料自行车,并在全国进行销售。2019年12月业务部门需要向上级汇报11月份的自行车销售情况,为精细化运营提供数据支持,精准定位目标客户群体。

2.2 产品介绍

产品分为三个大类:自行车(公司主要产品)、服装、配件

  • 自行车:分为三种类型,公路自行车、山地自行车、旅游自行车
  • 服装:包括帽子、手套、短裤、背心、袜子、运动衫
  • 配件:包括头盔、挡泥板、水袋、清洁工、瓶子和笼子、自行车架、自行车看台、车胎和内胎

二、数据搭建

1. 指标搭建

1.1 查看数据

adventure Works Cycles公司的mysql数据库中有三张表:

  • ods_sales_orders:订单明细表(一个订单表示销售一个产品),主要包括每个订单的详细信息:订单主键、订单日期、客户编号、产品编号、产品名、产品类别、产品子类、产品单价这些字段


    订单明细表
  • ods_customer:每日新增用户表,本表与“ods_sales_orders 订单明细表”可通过客户编号字段关联,本表记录每日新增客户系信,包括日期、出生日期、性别、婚否、年收入以及详细地理位置:省份、城市、地址、区域和经纬度等字段

每日新增用户表
  • dim_date_df:日期维度表,本表可通过日期字段与上述两张表关联,本表通过1和0对是否是当日、当月、当年、等字段的记录,可在后续对表的聚合加工中,对订单表与用户表进行不同时间维度查询


    日期维度表

1.2 确定需求

结合业务要求和现有数据,确定数据指标,为后续dashboard搭建和相关报告输出,提供完善数据。对MySQL数据库中三张表数据进行聚合和各维度关联等操作,完成下面三张表格:

  • dw_order_by_day:每日环比表
  1. 本表是对每日数据的一个汇总聚合,通过对ods_sales_orders(订单明细表)中的“create_date”字段进行聚合,获得每日总销售金额、总销量、并计算出平均订单价
  2. 设定每日销售量与销售金额目标值,便于后续dashboard展示目标完成情况
  3. 通过“create_date”字段关联dim_date_df(日期维度表),方便后续对日、月、季度、年等时间维度的销售情况进行聚合展示
  4. 计算每日销售金额与昨日销售金额环比
每日环比表
  • dw_customer_order:时间-地区-产品聚合表
  1. 提取当日维度下ods_sales_orders(订单明细表)和ods_customer(每日新增用户表),并通过“customer_key”字段关联两张表
  2. 提取订单主键、订单日期、客户编号、产品名、产品子类、产品类别、产品单价、所在区域、所在省份、所在城市这些字段,使用groupby方法对订单日期、产品名、产品子类、产品类别、所在区域、所在省份、所在城市的逐级聚合
  3. 通过“create_date”字段关联dim_date_df(日期维度表),方便后续对日、月、季度、年等时间维度产品信息和地区信息的聚合展示
  4. 获得相应的订单总量、客户总量和销售总金额,完成对当日时间维度下产品和地区的聚合,可用于dashboard地区维度的切片和地图可视化
时间-地区-产品聚合表
  • dw_amount_diff:当日维度表
  1. 提取生成的的dw_order_by_day(每日环比表)的数据
  2. 通过“is_today”、“is_yesterday”、“is_current_year”、“is_last_year”等这些字段中的值:1和0来判断时间
  3. 对当日、昨日、今年、去年等的销售金额,销售量和客单价聚合起来,完成当日同比、昨日同比 、本月同比、本季度同比以及本年同比
  4. 使用“amount”、“order“、”avg“标记销售金额,销售量和客单价的同比结果,方便横向提取数据
每日环比表

2. python处理数据

使用python,连接服务器中MySQL数据库,通过关联、聚合、计算以及一些格式的转换等步骤,完成上述指标的搭建,并将生成的表格上传至MySQL数据库。详细代码如下:

2.1 生成dw_order_by_day表

导入相关模块,连接数据库

import pandas as pd
import pymysql
import random
from sqlalchemy import create_engine
pymysql.install_as_MySQLdb()
import datetime
import warnings
warnings.filterwarnings("ignore")
# 连接adventure_ods库
adventure_ods = create_engine('mysql://root:password@localhost:3306/adventure_ods?charset=gbk')
# 连接adventure_dw库
adventure_dw = create_engine('mysql://root:password@localhost:3306/adventure_dw?charset=gbk')

step1:读取ods_sales_orders(订单明细表),生成sum_amount_order(销量订单聚合表),求总销售金额与客单价

# 1、读取ods_sales_orders(订单明细表)
ods_sales_orders = pd.read_sql_query("select * from ods_sales_orders ", con=adventure_ods)
# 2、根据create_date分组求总销量及客户数量
sum_amount_order = ods_sales_orders.groupby(by='create_date').agg({'unit_price': sum, 'customer_key': pd.Series.nunique}).reset_index()
# 3、修改对应列名(unit_price→sum_amount', 'customer_key'→ 'sum_order)
sum_amount_order.rename(columns = {'unit_price': 'sum_amount', 'customer_key': 'sum_order'},inplace = True)
# 4、新增amount_div_order客单价列
sum_amount_order['amount_div_order'] = sum_amount_order['sum_amount'] / sum_amount_order['sum_order']

step2:利用空列表及循环生成对应随机值,与销量订单聚合表合并形成sum_amount_order_goal(销量订单聚合目标表)

sum_amount_goal_list = []
sum_order_goal_list = []
# 转为list类型,遍历每个日期
create_date_list = list(sum_amount_order['create_date'])
for i in create_date_list:
    # 1.生成一个在[0.85,1.1]随机数
    a = random.uniform(0.85,1.1)
    b = random.uniform(0.85,1.1)
    
    # 2.对应日期下生成总金额(sum_amount)*a的列
    amount_goal = list(sum_amount_order[sum_amount_order['create_date'] == i]['sum_amount'])[0] * a

    # 3.对应日期下生成总订单数(sum_order)*a的列
    order_goal = list(sum_amount_order[sum_amount_order['create_date'] == i]['sum_order'])[0] * b
    
    # 4.将生成的目标值加入空列表
    sum_amount_goal_list.append(amount_goal)
    sum_order_goal_list.append(order_goal)
    
# 5.合并sum_amount_order表与刚生成的目标值列,形成sum_amount_order_goal
sum_amount_order_goal = pd.concat([sum_amount_order, pd.DataFrame({'sum_amount_goal': sum_amount_goal_list, 'sum_order_goal': sum_order_goal_list})], axis=1)

step3:读取dim_date_df日期维度表

dim_date_df = """ 
        select create_date,
                is_current_year,
                is_last_year,
                is_yesterday,
                is_today,
                is_current_month,
                is_current_quarter
                from dim_date_df"""
dim_date_df = pd.read_sql_query(dim_date_df, con=adventure_ods)

step4:进行数据的融合,生成dw_order_by_day表

# 1.转化create_date格式为标准日期格式
sum_amount_order_goal['create_date'] = sum_amount_order_goal['create_date'].apply(lambda x: x.strftime('%Y-%m-%d'))
# 2.通过主键create_date连接日期维度,输出:dw_order_by_day每日环比表
dw_order_by_day = pd.merge(sum_amount_order_goal, dim_date_df, on='create_date', how='inner')
# 3.在sum_amount列基础上,增添环比数据列amount_diff
dw_order_by_day['amount_diff'] = dw_order_by_day['sum_amount'].pct_change()
dw_order_by_day['amount_diff'].fillna(0,inplace=True)

step5:删除旧的dw_order_by_day(每日环比表),存储新的dw_order_by_day

# 因为删除插入更新操作没有返回值,程序会抛出ResourceClosedError,并终止程序。使用try捕捉此异常。
try:
    # 作业代码:删除旧表数据
    pd.read_sql_query("Truncate table dw_order_by_day", con=adventure_dw)
    
except Exception as e:
    print("删除旧的dw_order_by_day表,error:{}".format(e))
# 替换原先的dw_order_by_day
dw_order_by_day.to_sql('dw_order_by_day', con=adventure_dw, if_exists='replace', index=False)

2.2 生成dw_customer_order表

导入相关模块,连接数据库

import pandas as pd
import pymysql
from sqlalchemy import create_engine
pymysql.install_as_MySQLdb()
import datetime
import warnings
warnings.filterwarnings("ignore")
# 连接adventure_ods库
adventure_ods = create_engine('mysql://root:password@localhost:3306/adventure_ods?charset=gbk')
# 连接adventure_dw库
adventure_dw = create_engine('mysql://root:password@localhost:3306/adventure_dw?charset=gbk')

step1:读取最新日期的ods_sales_orders(订单明细表)

ods_sales_orders = \
        '''select sales_order_key,
                  create_date,
                  customer_key,
                  english_product_name,
                  cpzl_zw,
                  cplb_zw,
                  unit_price  
            from ods_sales_orders where create_date =(
            select create_date
            from dim_date_df
            order by create_date desc
            limit 1) '''
ods_sales_orders = pd.read_sql_query(ods_sales_orders, con=adventure_ods)
ods_sales_orders.info()

step2:读取每日新增用户表ods_customer

ods_customer = ''' 
       select customer_key,
               chinese_territory,
               chinese_province,
               chinese_city
               from ods_customer'''
ods_customer = pd.read_sql_query(ods_customer, con=adventure_ods)
ods_customer.info()

step3:读取日期维度表dim_date_df

dim_date_df = """ 
       select create_date,
               is_current_year,
               is_last_year,
               is_yesterday,
               is_today,
               is_current_month,
               is_current_quarter
               from dim_date_df"""
dim_date_df = pd.read_sql_query(dim_date_df, con=adventure_ods)
dim_date_df.info()

step4:进行数据的聚合

# 将customer_key由int类型转为str类型
ods_customer['customer_key'] = ods_customer['customer_key'].map(lambda x:str(x))
# 通过客户id连接表
sales_customer_order = pd.merge(ods_sales_orders, ods_customer, left_on='customer_key', right_on='customer_key', how='left')
# 1.提取订单主键/订单日期/客户编号/产品名/产品子类/产品类别/产品单价/所在区域/所在省份/所在城市
sales_customer_order = sales_customer_order[['sales_order_key', 'create_date', 'customer_key', 'english_product_name',\
                                             'cpzl_zw','cplb_zw','unit_price', 'chinese_territory', 'chinese_province', 'chinese_city']]
# 2.形成按照 订单日期/产品名/产品子类/产品类别/所在区域/所在省份/所在城市的逐级聚合表,获得订单总量/客户总量/销售总金额
sum_customer_order = sales_customer_order.groupby(['create_date','english_product_name','cpzl_zw','cplb_zw','chinese_territory',\
                                                 'chinese_province', 'chinese_city'],as_index=False).\
                                                agg({'sales_order_key':pd.Series.nunique,'customer_key':pd.Series.nunique,'unit_price':sum})
sum_customer_order.rename(columns = {'sales_order_key':'order_num','customer_key':'customer_num',\
                                     'unit_price':'sum_amount','english_product_name':'product_name'},inplace=True)
# 3.转化订单日期为字符型格式
sum_customer_order['create_date'] = sum_customer_order['create_date'].apply(lambda x: x.strftime('%Y-%m-%d'))
# 4.获取当日日期维度
dw_customer_order = pd.merge(sum_customer_order, dim_date_df, on='create_date', how='inner')

step5:进行数据的存储

dw_customer_order.to_sql('dw_customer_order', con=adventure_dw,if_exists='replace', index=False)

2.3 生成dw_amount_diff表

导入相关模块,连接数据库

import pandas as pd
import pymysql
from sqlalchemy import create_engine
pymysql.install_as_MySQLdb()
import datetime
import warnings
warnings.filterwarnings("ignore")
# 连接adventure_dw库
adventure_dw = create_engine('mysql://root:password@localhost:3306/adventure_dw?charset=gbk')

step1:读取dw_order_by_day表

dw_order_by_day = pd.read_sql_query("select * from dw_order_by_day", con=adventure_dw)
dw_order_by_day['create_date'] = pd.to_datetime(dw_order_by_day['create_date'])

step2:求取各阶段的总金额

"""当天"""
# 当天的总金额
today_amount = dw_order_by_day[dw_order_by_day['is_today'] == 1]['sum_amount'].sum()
# 去年同期的日期维度
before_year_today = list(dw_order_by_day[dw_order_by_day['is_today'] == 1]['create_date'] + datetime.timedelta(days=-365))
# 去年同期总金额
before_year_today_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_today)]['sum_amount'].sum()

"""昨天"""
# 昨天的总金额
yesterday_amount = dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['sum_amount'].sum()
# 去年同期的日期维度
before_year_yesterday = list(dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['create_date'] + datetime.timedelta(days=-365))
# 去年同期总金额
before_year_yesterday_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_yesterday)]['sum_amount'].sum()

"""当前月份"""
# 当月的总金额
month_amount = dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['sum_amount'].sum()
# 去年同期的日期维度
before_year_month = list(dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['create_date'] + datetime.timedelta(days=-365))
# 去年同期总金额
before_year_month_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_month)]['sum_amount'].sum()

"""当前季度"""
# 当季的总金额
quarter_amount = dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['sum_amount'].sum()
# 去年同期的日期维度
before_year_quarter = list(dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['create_date'] + datetime.timedelta(days=-365))
# 去年同期总金额
before_year_quarter_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_quarter)]['sum_amount'].sum()

"""当前年份"""
# 当年的总金额
year_amount = dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['sum_amount'].sum()
# 去年同期的日期维度
before_year_year = list(dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['create_date'] + datetime.timedelta(days=-365))
# 去年同期总金额
before_year_year_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_year)]['sum_amount'].sum()

step3:求取各阶段的总客户数

"""当天"""
# 当天的总客户数
today_order = dw_order_by_day[dw_order_by_day['is_today'] == 1]['sum_order'].sum()
# 去年同期总客户数
before_year_today_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_today)]['sum_order'].sum()

"""昨天"""
# 昨天的总客户数
yesterday_order = dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['sum_order'].sum()
# 去年同期总客户数
before_year_yesterday_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_yesterday)]['sum_order'].sum()

"""当前月份"""
# 当月的总客户数
month_order = dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['sum_order'].sum()
# 去年同期总客户数
before_year_month_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_month)]['sum_order'].sum()

"""当前季度"""
# 当季的总客户数
quarter_order = dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['sum_order'].sum()
# 去年同期总客户数
before_year_quarter_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_quarter)]['sum_order'].sum()

"""当前年份"""
# 当年的总客户数
year_order = dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['sum_order'].sum()
# 去年同期总客户数
before_year_year_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_year)]['sum_order'].sum()

step4:求取各阶段的总金额、订单数的同期对比数据

amount_dict = {'today_diff': [today_amount / before_year_today_amount - 1,
                              today_order / before_year_today_order - 1,
                              (today_amount / today_order) / (before_year_today_amount /before_year_today_order) - 1],
               'yesterday_diff': [yesterday_amount / before_year_yesterday_amount - 1,
                                  yesterday_order / before_year_yesterday_order - 1,
                                  (yesterday_amount / yesterday_order) / (before_year_yesterday_amount / before_year_yesterday_order) - 1],
               'month_diff': [month_amount / before_year_month_amount - 1,
                              month_order / before_year_month_order - 1,
                              (month_amount / month_order) / (before_year_month_amount / before_year_month_order) - 1],
               'quarter_diff': [quarter_amount / before_year_quarter_amount - 1,
                                quarter_order / before_year_quarter_order - 1,
                                (quarter_amount / quarter_order) / (before_year_quarter_amount / before_year_quarter_order) - 1],
               'year_diff': [year_amount / before_year_year_amount - 1,
                             year_order / before_year_year_order - 1,
                             (year_amount / year_order) / (before_year_year_amount / before_year_year_order) - 1],
               'flag': ['amount', 'order', 'avg']}

dw_amount_diff = pd.DataFrame(amount_dict)

step5:删除旧的dw_amount_diff表,存储新的dw_amount_diff表

# 因为删除插入更新操作没有返回值,程序会抛出ResourceClosedError,并终止程序。使用try捕捉此异常。
try:
    pd.read_sql_query("Truncate table dw_amount_diff", con=adventure_dw)
except Exception as e:
    print("删除旧的dw_amount_diff表,error:{}".format(e))
# 存储新的dw_amount_diff表
dw_amount_diff.to_sql('dw_amount_diff', con=adventure_dw, if_exists='replace', index=False)

3. Tableau搭建Dashboard

3.1 周期性看板

当日报表

本月报表

本季报表

本年报表

3.2 异常监控

异常监控

3.3 区域细分

区域细分

三、报告输出

1. 指标搭建

1.1 目的

  • 根据需求,需要输出2019.11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体;
  • 制定销售策略,调整产品结构,才能保持高速增长,获取更多的收益,占领更多市场份额;
  • 报告通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。

1.2 数据来源

  • dw_customer_order:产品销售信息事实表
  • ods_customer:每天新增客户信息表
  • dim_date_df:日期表
  • ods_sales_orders:订单明细表

1.3 指标制定

  • 从整体的角度:分析2019.1—2019.11自行车整体销售表现
  • 从地域的角度:分析11月每个区域销售量表现、11月TOP10城市销售量表现
  • 从产品的角度:分析11月类别产品销售量表现、11月细分产品销售量表现
  • 热销产品:分析11月TOP10产品销量榜、11月TOP10销量增速榜
  • 从用户的角度:分析11月用户年龄分布及每个年龄段产品购买喜好、11月男女用户比例及产品购买喜好

2. python加工数据

代码如下:

#导入模块
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine

part1:自行车整体销售表现

'''1. 从数据库读取源数据:dw_customer_order'''
# 读取dw_customer_order
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_ods')
gather_customer_order=pd.read_sql_query("select * from dw_customer_order",con = engine)
# 查看源数据前5行,观察数据,判断数据是否正常识别
gather_customer_order.head()
# 查看源数据类型
gather_customer_order.info()
# 利用create_date字段增加create_year_month月份字段
gather_customer_order['create_year_month'] = gather_customer_order["create_date"].apply(lambda x: x.strftime('%Y-%m'))
# 筛选产品类型cplb_zw中的自行车作为新的gather_customer_order
gather_customer_order = gather_customer_order[gather_customer_order['cplb_zw']=="自行车"]

'''2. 自行车整体销售量表现'''
# 取消科学计数法
pd.set_option('float_format', lambda x: '%.6f' % x)
#每月订单数量和销售金额,用groupby创建一个新的对象,需要将order_num、sum_amount求和,按日期降序排序,方便计算环比
overall_sales_performance = gather_customer_order.groupby("create_year_month").agg({"order_num":"sum","sum_amount":"sum"}).\
                                                    sort_values(by = "create_year_month",ascending = False).reset_index()
# 新增一列order_num_diff,此为每月自行车销售订单量环比,本月与上月对比,例如本期2019-02月销售额与上一期2019-01月销售额做对比
order_num_diff = list(-(overall_sales_performance.order_num.diff()/overall_sales_performance.order_num))
 # 删除列表中第一个元素
order_num_diff.pop(0)
# 将0新增到列表末尾
order_num_diff.append(0) 
# 将环比转化为DataFrame
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame(order_num_diff)],axis=1)
# 新增一列sum_amount_diff,此为每月自行车销售金额环比,最后形成按照日期升序排列
sum_amount_diff = list(-(overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount))
sum_amount_diff.pop(0) 
sum_amount_diff.append(0) 
sum_amount_diff
# 将环比转化为DataFrame
overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame(sum_amount_diff)],axis = 1)
# 销量环比字段名order_diff,销售金额环比字段名sum_amount_diff
overall_sales_performance.columns = ['create_year_month', 'order_num', 'sum_amount', 'order_diff','sum_amount_diff']
# 按照日期排序,升序
overall_sales_performance = overall_sales_performance.sort_values(by = "create_year_month",ascending = True)

'''3. 将最终的overall_sales_performance的DataFrame存入数据库中'''
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
overall_sales_performance.to_sql('pt_overall_sale_performance_1_cc',con = engine,if_exists='replace')

overall_sales_performance

part2:2019年11月自行车地域销售表现

'''1. 源数据dw_customer_order,数据清洗筛选10月11月数据'''
# 筛选10、11月的自行车数据,赋值变量为gather_customer_order_10_11
gather_customer_order_10_11 = gather_customer_order[gather_customer_order["create_year_month"].isin(['2019-10','2019-11'])]
# 查看10月、11月的自行车订单数量
len(gather_customer_order_10_11)

'''2. 2019年11月自行车区域销售量表现'''
#按照区域、月分组,订单量求和,销售金额求和
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(["chinese_territory","create_year_month"]).\
                                            agg({"order_num":"sum","sum_amount":"sum"}).reset_index()
#将区域存为列表
region_list = list(gather_customer_order["chinese_territory"].unique())
# pct_change()当前元素与先前元素的相差百分比,求不同区域10月11月环比
order_x = pd.Series([])
amount_x = pd.Series([])
for i in region_list:
    a = gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['order_num'].pct_change()
    b = gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['sum_amount'].pct_change()
    order_x = order_x.append(a)
    amount_x = amount_x.append(b)
gather_customer_order_10_11_group['order_diff'] = order_x
gather_customer_order_10_11_group['amount_diff'] = amount_x
# 填充NaN值
gather_customer_order_10_11_group.fillna(value = 0,inplace = True)
# 将数据存入数据库
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_cc',con = engine,if_exists='replace')
gather_customer_order_10_11_group
'''3. 2019年11月自行车销售量TOP10城市环比'''
#筛选11月自行车交易数据
gather_customer_order_11 = gather_customer_order[(gather_customer_order["create_year_month"]=='2019-11')]
# 将gather_customer_order_11按照chinese_city城市分组,求和销售数量order_num,最终查看11月自行车销售数量前十城市,赋予变量gather_customer_order_city_head
gather_customer_order_city_11 = gather_customer_order_11.groupby("chinese_city").agg({"order_num":"sum"}).reset_index().\
                                                    sort_values(by = "order_num",ascending = False)
#11月自行车销售数量前十城市
gather_customer_order_city_head = gather_customer_order_city_11.head(10)
#筛选销售前十城市的10月11月自行车销售数据
gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11["chinese_city"].isin(list(gather_customer_order_city_head.chinese_city))]
#分组计算前十城市,自行车销售数量销售金额
gather_customer_order_city_10_11 = gather_customer_order_10_11_head.groupby(["chinese_city","create_year_month"]).\
                                            agg({"order_num":"sum","sum_amount":"sum"}).reset_index()
#计算前十城市环比
city_top_list = gather_customer_order_city_head["chinese_city"]
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
    a=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change()
    b=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change()
    order_top_x = order_top_x.append(a)
    amount_top_x = amount_top_x.append(b)
#order_diff销售数量环比,amount_diff销售金额环比
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff']=amount_top_x
# 填充NaN值
gather_customer_order_city_10_11.fillna(value = 0,inplace = True)
# 11月销售数量前十城市的销售数量、销售金额环比。存入数据库
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_cc',con = engine,if_exists='replace')
gather_customer_order_city_10_11

part3:2019年11月自行车产品销售表现

# 细分市场销量表现
'''1. 数据源 dw_customer_order'''
#查看数据源
gather_customer_order.head(5)

'''2.细分市场销量表现'''
#求每个月自行车累计销售数量
gather_customer_order_group_month = gather_customer_order.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
#合并自行车销售信息表+自行车每月累计销售数量表
order_num_proportion = pd.merge(gather_customer_order,gather_customer_order_group_month,how = "left",on = "create_year_month")
#计算自行车销量/自行车每月销量占比
order_num_proportion['order_proportion'] = order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
#重命名sum_month_order:自行车每月销售量
order_num_proportion = order_num_proportion.rename(columns = {"order_num_x":"order_num","order_num_y":"sum_month_order"})
#将每月自行车销售信息存入数据库
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
order_num_proportion.to_sql('pt_bicycle_product_sales_month_4_cc',con = engine,if_exists = 'replace')
order_num_proportion
'''3. 公路/山地/旅游自行车细分市场表现'''
#查看自行车有那些产品子类
gather_customer_order['cpzl_zw'].unique()
'''3.1 公路自行车细分市场销量表现'''
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
#求公路自行车不同型号产品销售数量
gather_customer_order_road_month = gather_customer_order_road.groupby(by = ['create_year_month','product_name']).agg({"order_num":"sum"}).reset_index()
gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
#每个月公路自行车累计销售数量
gather_customer_order_road_month_sum = gather_customer_order_road_month.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
#合并公路自行车gather_customer_order_road_month与每月累计销售数量
#用于计算不同型号产品的占比
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,how='left',on='create_year_month')
'''3.2 山地自行车细分市场销量表现'''
# 筛选
gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行车']
#求山地自行车不同型号产品销售数量
gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(by = ['create_year_month','product_name']).agg({"order_num":"sum"}).reset_index()
gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
#每个月山地自行车累计销售数量
gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
#合并山地自行车hz_customer_order_Mountain_month与每月累计销售数量
#用于计算不同型号产品的占比
gather_customer_order_Mountain_month = pd.merge(gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum,how='left',on='create_year_month')

'''3.3 旅游自行车细分市场销量表现'''
gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
#求旅游自行车不同型号产品销售数量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(by = ['create_year_month','product_name']).agg({"order_num":"sum"}).reset_index()
gather_customer_order_tour_month['cpzl_zw'] = '旅游自行车'
gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,how='left',on='create_year_month')

'''3.4 将山地自行车、旅游自行车、公路自行车每月销量信息合并'''
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_Mountain_month,gather_customer_order_tour_month])
gather_customer_order_month
#各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportio'] = gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
# 重命名:order_month_product当月产品累计销量,sum_order_month当月自行车总销量
gather_customer_order_month.rename(columns = {'order_num_x':'order_month_product','order_num_y':'sum_order_month'})
#将数据存入数据库
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_cc',con = engine,if_exists = 'replace')
gather_customer_order_month
'''4. 计算2019年11月自行车环比'''
#计算11月环比,先筛选10月11月数据
gather_customer_order_month_10_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
#排序。将10月11月自行车销售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])
product_name  = list(gather_customer_order_month_10_11.product_name.drop_duplicates())

'''5. 计算自行车销售数量环比'''
order_top_x = pd.Series([])
for i in product_name:
    a = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name']==i]["order_num_x"].pct_change()
    order_top_x = order_top_x.append(a)
gather_customer_order_month_10_11['order_num_diff'] = order_top_x
gather_customer_order_month_10_11.fillna(value = 0,inplace = True)
gather_customer_order_month_10_11.rename(columns = {"order_num_x":"order_month_product","order_num_y":"sum_order_month"},inplace = True)
#筛选出11月自行车数据
gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month'] == '2019-11']

'''6. 计算2019年1月至11月产品累计销量'''
#筛选2019年1月至11月自行车数据,这里使用的是‘~’取对立面
gather_customer_order_month_1_11 = gather_customer_order_month[~gather_customer_order_month['create_year_month'].\
                                isin(['2019-12','2020-01','2020-02','2020-03','2020-04','2020-05','2020-06','2020-07','2020-08','2020-09','2020-10','2020-11'])]
gather_customer_order_month_1_11.rename(columns = {"order_num_x":"order_month_product","order_num_y":"sum_order_month"},inplace = True)
#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').agg({'order_month_product':'sum'}).reset_index()
#重命名sum_order_1_11:1-11月产品累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})

'''7. 2019年11月自行车产品销量、环比、累计销量'''
#按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,how='left',on='product_name')

''' 8. 存入数据库'''
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_cc',con = engine, if_exists = 'replace')

gather_customer_order_month_11

part4:用户行为分析

# 读取数据库客户信息表
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_ods?charset=gbk')
df_customer = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = engine)
# 查看数据类型
df_customer.info()
# 将customer_key由int类型转为str类型
df_customer['customer_key'] = df_customer.customer_key.map(lambda x:str(x))

# 读取数据库销售订单表
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_ods?charset=gbk')
df_sales_orders_11 = pd.read_sql_query("select *  from ods_sales_orders where create_date>='2019-11-1' and   create_date<'2019-12-1'",con = engine)

# pd.merge,how = 'inner join'
sales_customer_order_11 = pd.merge(df_sales_orders_11,df_customer,on='customer_key',how='inner')

# 根据sales_customer_order_11['birth_date'],获取客人的年份作为新的一列,以字符串类型存储
birth_year = sales_customer_order_11.birth_date.astype(str).apply(lambda x: x.split('-')[0]).rename('birth_year')
sales_customer_order_11 = pd.concat([sales_customer_order_11,birth_year],axis = 1)

'''1. 用户年龄分析'''
#修改出生年为int数据类型
sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_year.astype(int)
# 计算用户年龄
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
# 利用customer_age字段,进行年龄分层,
#年龄分层1
listBins = [30,35,40,45,50,55,60,65]
listLabels = ['31-35','36-40','41-45','46-50','51-55','56-60','61-65']
#新增'age_level'分层区间列
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],bins=listBins, labels=listLabels, include_lowest=False)
#筛选销售订单为自行车的订单信息
df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
# 计算年龄比率
df_customer_order_bycle['age_level_rate'] = 1/df_customer_order_bycle['age_level'].count()
# 年龄分层2
# 将年龄分为3个层次,分别为'<=29'、'30-39'、'>=40'
listBins = [0,29,39,65]
listLabels = ['<=29','30-39','>=40']
df_customer_order_bycle['age_level2'] = pd.cut(sales_customer_order_11['customer_age'],bins=listBins, labels=listLabels, include_lowest=False)
# 求每个年龄段人数
age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').sales_order_key.count().reset_index()

'''2. 用户性别'''
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})
df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']

#df_customer_order_bycle 将11月自行车用户存入数据库
#存入数据库
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
df_customer_order_bycle.to_sql('pt_user_behavior_november_cc',con = engine,if_exists='replace', index=False)

df_customer_order_bycle

part5:2019年11月热品销售分析

'''1. 11月产品销量TOP10产品,销售数量及环比'''
# 筛选11月数据
gather_customer_order_11 = gather_customer_order.loc[gather_customer_order['create_year_month'] == '2019-11']
# 按照销量降序,取出TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').order_num.count().reset_index().sort_values(by = 'order_num',ascending = False).head(10)
# TOP10销量产品信息
list(customer_order_11_top10['product_name'])

# 计算TOP10销量及环比
# 查看11月环比数据
gather_customer_order_month_10_11.head()
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].isin(list(customer_order_11_top10['product_name']))]
customer_order_month_10_11['category'] = '本月TOP10销量'

'''2. 11月增速TOP10产品,销售数量及环比'''
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].sort_values(by = 'order_num_diff',ascending = False).head(10)
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].isin(list(customer_order_month_11['product_name']))]
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'

# 合并TOP10销量表customer_order_month_10_11,TOP10增速customer_order_month_11_top10_seep
#axis = 0按照行维度合并,axis = 1按照列维度合并
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
#存入数据库
engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
hot_products_11.to_sql('pt_hot_products_november_cc',con = engine,if_exists='replace', index=False)
hot_products_11

3. PPT汇报输出

通过上述数据加工,将存入mysql的数据接入powerbi与Excel,分别绘制图像与表格,通过以下5个方面,完成线上自行车业务分析报告:

  • 整体销售表现
  • 地域销售表现
  • 产品销售表现
  • 热品销售表现
  • 用户行为分析
幻灯片1.PNG

幻灯片2.PNG

幻灯片3.PNG

幻灯片4.PNG

幻灯片5.PNG

幻灯片6.PNG

幻灯片7.PNG

幻灯片8.PNG

幻灯片9.PNG

幻灯片10.PNG

幻灯片11.PNG

幻灯片12.PNG

幻灯片13.PNG

幻灯片14.PNG

幻灯片15.PNG

幻灯片16.PNG

幻灯片17.PNG

幻灯片18.PNG

幻灯片19.PNG

幻灯片20.PNG

幻灯片21.PNG

幻灯片22.PNG

幻灯片23.PNG

幻灯片24.PNG
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,864评论 6 494
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,175评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,401评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,170评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,276评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,364评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,401评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,179评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,604评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,902评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,070评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,751评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,380评论 3 319
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,077评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,312评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,924评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,957评论 2 351

推荐阅读更多精彩内容