1. Group By
1.1 split-apply-combine
By “group by” we are referring to a process involving one or more of the following steps
- Splitting the data into groups based on some criteria
- Applying a function to each group independently
- Combining the results into a data structure
Be quite familiar to those who have used a SQL-based tool, in which you can write code like:
SELECT Column1, Column2, mean(Column3), sum(Column4)
FROM SomeTable
GROUP BY Column1, Column2
1.2 splitting an object into groups
1.2.1 GroupBy sorting
In [13]: df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A'], 'Y' : [1, 2, 3, 4]})
In [14]: df2.groupby(['X']).sum()
Out[14]:
Y
X
A 7
B 3
In [15]: df2.groupby(['X'], sort=False).sum()
Out[15]:
Y
X
B 3
A 7
1.3 Selecting a group
In [52]: grouped.get_group('bar')
Out[52]:
A B C D
1 bar one 0.254161 1.511763
3 bar three 0.215897 -0.990582
5 bar two -0.077118 1.211526
In [53]: df.groupby(['A', 'B']).get_group(('bar', 'one'))
Out[53]:
A B C D
1 bar one 0.254161 1.511763
1.4 Aggregation
In [54]: grouped = df.groupby('A')
In [55]: grouped.aggregate(np.sum)
Out[55]:
C D
A
bar 0.392940 1.732707
foo -1.796421 2.824590
In [56]: grouped = df.groupby(['A', 'B'])
In [57]: grouped.aggregate(np.sum)
Out[57]:
C D
A B
bar one 0.254161 1.511763
three 0.215897 -0.990582
two -0.077118 1.211526
foo one -0.983776 1.614581
three -0.862495 0.024580
two 0.049851 1.185429