pandas中DataFrame结构的索引分为行索引和列索引,因此在不同的用法下索引指代的操作也是不一样的.这里记录一下,后面忘记了再回来看.
- np.array 和 pd.Series的结构都可以通过比较运算符拿到布尔索引.这是python list不具有的性质.
# np.array
In: a = np.array([1,5,7,8,9,10])
In: a_bool = a>8 # list 无法实现这样的操作
In: a_bool
Out: array([False, False, False, False, True, True])
In: data = pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio', 'Colorado', 'Utah', 'New York'],
columns=['one', 'two', 'three', 'four'])
Out:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
Utah 8 9 10 11
New York 12 13 14 15
In: data['three'] >5 # 能够拿到bool索引
Out: hio False
Colorado True
Utah True
New York True
Name: three, dtype: bool
In: data[data['three'] >5] # 这个地方DataFrame传入的索引指代行
Out:
one two three four
Colorado 4 5 6 7
Utah 8 9 0 11
New York 12 13 14 15
- 关于DataFrame自身的索引也有多种方式
# 通过切片索引,拿到的是行
In : data[:2]
Out:
one two three four
Ohio 0 1 2 3
Colorado 4 5 6 7
# 通过列标签索引拿到的是列
In : data[['three', 'one']]
Out:
three one
Ohio 2 0
Colorado 6 4
Utah 10 8
New York 14 12
# 通过boolean量索引拿到的依然是行
In: data[data['three'] >5]
Out:
one two three four
Colorado 4 5 6 7
Utah 8 9 0 11
New York 12 13 14 15
# 还可以通过整体的比较运算 拿到每一个位置的bool量
In [134]: data < 5
Out[134]:
one two three four
Ohio True True True True
Colorado True False False False
Utah False False False False
New York False False False False
In [135]: data[data < 5] = 0
In [136]: data
Out[136]:
one two three four
Ohio 0 0 0 0
Colorado 0 5 6 7
Utah 8 9 10 11
New York 12 13 14 15