要求:
1、表格有ID、Total、Date、Product、Subcategor、Category数据
2、以Category和Date的年份分组统计Total的总和,个数。
import pandas as pd
from datetime import date
import numpy as np
# 输出能美观一些
pd.set_option('display.max_columns', 10000, 'display.max_rows', 10000,'display.width', 1000)
Orders=pd.read_excel("D:\\python_pandas\\sample\\demo14\\Orders.xlsx", dtype={'Date': date})
Orders["year"]=pd.DatetimeIndex(Orders.Date).year
#方法一:
tb = Orders.pivot_table(values=["Total","ID"],index="Category",columns="year",aggfunc={"Total":np.sum,"ID":len})
print(tb)
ID Total
year 2011 2012 2013 2014 2011 2012 2013 2014
Category
Accessories 360 1339 20684 18811 2.082077e+04 1.024398e+05 6.750247e+05 4.737876e+05
Bikes 3826 10776 16485 8944 1.194565e+07 2.898552e+07 3.626683e+07 1.745318e+07
Clothing 655 4045 10266 6428 3.603148e+04 5.555877e+05 1.067690e+06 4.612336e+05
Components 875 5529 9138 3156 6.391730e+05 3.880758e+06 5.612935e+06 1.669727e+06
pivot_table图解说明
#方法二:
g = Orders.groupby(["Category","year"])
s = g["Total"].sum()
c = g["ID"].count()
tb2 = pd.DataFrame({"sum":s,"count":c})
print(tb2)
打印结果:
sum count
Category year
Accessories 2011 2.082077e+04 360
2012 1.024398e+05 1339
2013 6.750247e+05 20684
2014 4.737876e+05 18811
Bikes 2011 1.194565e+07 3826
2012 2.898552e+07 10776
2013 3.626683e+07 16485
2014 1.745318e+07 8944
Clothing 2011 3.603148e+04 655
2012 5.555877e+05 4045
2013 1.067690e+06 10266
2014 4.612336e+05 6428
Components 2011 6.391730e+05 875
2012 3.880758e+06 5529
2013 5.612935e+06 9138
2014 1.669727e+06 3156