从 DataFrame 的行或列中抽取一个 Series 或一系列值的单个值(如总和或平均值)。与 Numpy 数组中类似方法相比,他们内建了处理缺失值的功能。
In [37]: df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
...: [np.nan, np.nan], [0.75, -1.3]],
...: index=['a', 'b', 'c', 'd'],
...: columns=['one', 'two'])
In [38]: df
Out[38]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3
sum 返回列和,axis='columns'或axis=1,返回行和
In [39]: df.sum()
Out[39]:
one 9.25
two -5.80
dtype: float64
In [40]: df.sum(axis='columns')
Out[40]:
a 1.40
b 2.60
c 0.00
d -0.55
dtype: float64
In [41]: df.sum(axis=1)
Out[41]:
a 1.40
b 2.60
c 0.00
d -0.55
dtype: float64
通过禁用 skipna 实现不排除 NA 值
In [42]: df.mean(axis='columns', skipna=False)
Out[42]:
a NaN
b 1.300
c NaN
d -0.275
dtype: float64
归约方法可选参数
方法 |
描述 |
axis |
归约轴,0为行香,1为列项 |
skipna |
排除缺失值,默认为 True |
level |
如果轴是多层索引的(MultiIndex),该参数可以缩减分组层级 |
idxmin 和 idxmax,返回间接统计信息,如最大值和最小值的索引值
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
...: [np.nan, np.nan], [0.75, -1.3]],
...: index=['a', 'b', 'c', 'd'],
...: columns=['one', 'two'])
In [4]: df
Out[4]:
one two
a 1.40 NaN
b 7.10 -4.5
c NaN NaN
d 0.75 -1.3
In [6]: df.idxmax()
Out[6]:
one b
two d
dtype: object
积累型方法
In [7]: df.cumsum()
Out[7]:
one two
a 1.40 NaN
b 8.50 -4.5
c NaN NaN
d 9.25 -5.8
describe 一次性产生多个汇总统计
In [8]: df.describe()
Out[8]:
one two
count 3.000000 2.000000
mean 3.083333 -2.900000
std 3.493685 2.262742
min 0.750000 -4.500000
25% 1.075000 -3.700000
50% 1.400000 -2.900000
75% 4.250000 -2.100000
max 7.100000 -1.300000
In [9]: obj = pd.Series(['a', 'a', 'b', 'c'] * 4)
In [10]: obj.describe()
Out[10]:
count 16
unique 3
top a
freq 8
汇总统计及其相关方法的完整列表在《利用Python进行数据分析》159页