数据分组运算

import pandas as pd
import numpy as np

分组运算后保持shape

dict_obj = {'key1' : ['a', 'b', 'a', 'b', 
                      'a', 'b', 'a', 'a'],
            'key2' : ['one', 'one', 'two', 'three',
                      'two', 'two', 'one', 'three'],
            'data1': np.random.randint(1, 10, 8),
            'data2': np.random.randint(1, 10, 8)}
df_obj = pd.DataFrame(dict_obj)
df_obj

Paste_Image.png

按key1分组后，计算data1，data2的统计信息并附加到原始表格中

k1_sum = df_obj.groupby('key1').sum().add_prefix('sum_')
k1_sum

Paste_Image.png

方法，使用merge

pd.merge(df_obj, k1_sum, left_on='key1', right_index=True)

Paste_Image.png

dataset_path = './starcraft.csv'
df_data = pd.read_csv(dataset_path, usecols=['LeagueIndex', 'Age', 'HoursPerWeek', 
                                             'TotalHours', 'APM'])

apply

def top_n(df, n=3, column='APM'):
    """
        返回每个分组按 column 的 top n 数据
    """
    return df.sort_values(by=column, ascending=False)[:n]

df_data.groupby('LeagueIndex').apply(top_n)

Paste_Image.png

apply函数接收的参数会传入自定义的函数中

df_data.groupby('LeagueIndex').apply(top_n, n=2, column='Age')

Paste_Image.png

禁止分组 group_keys=False

df_data.groupby('LeagueIndex', group_keys=False).apply(top_n)

Paste_Image.png

最后编辑于：2017.12.10 06:50:52

数据分组运算

数据分组运算

分组运算后保持shape

按key1分组后，计算data1，data2的统计信息并附加到原始表格中

方法，使用merge

apply

apply函数接收的参数会传入自定义的函数中

禁止分组 group_keys=False

推荐阅读更多精彩内容