1.Series索引的基础复习
2.DataFrame索引的基础复习
3.Series重复索引
4.DataFrame索引
最近更新:2018-01-30
1.Series索引的基础复习
import pandas as pd
import numpy as np
s=pd.Series(np.random.rand(5),index=list("abcde"))
s
Out[4]:
a 0.492619
b 0.986350
c 0.330461
d 0.141790
e 0.922023
dtype: float64
s.index
Out[5]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
s.index.name="alpha"
s
Out[7]:
alpha
a 0.492619
b 0.986350
c 0.330461
d 0.141790
e 0.922023
dtype: float64
2.DataFrame索引的基础复习
import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.randn(4,3),columns=["one","two","three"])
df
Out[9]:
one two three
0 -0.000841 -0.707295 -0.185961
1 0.715077 -0.284825 -0.331871
2 -0.685716 0.399389 1.529932
3 1.277260 -0.124277 1.462555
df.index
Out[10]: RangeIndex(start=0, stop=4, step=1)
df.columns
Out[11]: Index(['one', 'two', 'three'], dtype='object')
df.columns.name="col"
df.index.name="row"
df
Out[14]:
col one two three
row
0 -0.000841 -0.707295 -0.185961
1 0.715077 -0.284825 -0.331871
2 -0.685716 0.399389 1.529932
3 1.277260 -0.124277 1.462555
3.Series重复索引
s=pd.Series(np.arange(6),index=list("abcbda"))
s
Out[16]:
a 0
b 1
c 2
b 3
d 4
a 5
dtype: int32
3.1Series重复的索引
重复的索引返回的是一个series
s["a"]
Out[17]:
a 0
a 5
dtype: int32
3.2Series不重复的索引
不重复的索引返回的是一个数据
s["c"]
Out[18]: 2
3.3Series判断是否为重复索引的方法
s.index.is_unique
Out[19]: False
3.4Series返回唯一索引的方法
s.index.unique
Out[20]: <bound method Index.unique of Index(['a', 'b', 'c', 'b', 'd', 'a'], dtype='object')>
3.5Series对索引进行分组,并对值运算
- 求和,例如a=0+5=5,b=1+3=4
s.groupby(s.index).sum()
Out[21]:
a 5
b 4
c 2
d 4
dtype: int32
- 求平均值
s.groupby(s.index).mean()
Out[22]:
a 2.5
b 2.0
c 2.0
d 4.0
dtype: float64
3.6Series对多维数据进行索引
a=[["a","a","a","b","b","c","c"],[1,2,3,1,2,2,3]]
t=list(zip(*a))
t
Out[25]: [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('c', 2), ('c', 3)]
index=pd.MultiIndex.from_tuples(t,names=["level1","level2"])
index
Out[29]:
MultiIndex(levels=[['a', 'b', 'c'], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 2, 2], [0, 1, 2, 0, 1, 1, 2]],
names=['level1', 'level2'])
s=pd.Series(np.random.rand(7),index=index)
s
Out[31]:
level1 level2
a 1 0.225854
2 0.599906
3 0.794228
b 1 0.715520
2 0.114191
c 2 0.796958
3 0.712853
dtype: float64
- 对最外层的索引
s["a"]
Out[32]:
level2
1 0.225854
2 0.599906
3 0.794228
dtype: float64
s["b":"c"]
Out[33]:
level1 level2
b 1 0.715520
2 0.114191
c 2 0.796958
3 0.712853
dtype: float64
s[["a","c"]]
Out[34]:
level1 level2
a 1 0.225854
2 0.599906
3 0.794228
c 2 0.796958
3 0.712853
dtype: float64
- 最内层的索引
s[:,2]
Out[35]:
level1
a 0.599906
b 0.114191
c 0.796958
dtype: float64
4.DataFrame索引
4.1DataFrame对多维数据进行索引
df=pd.DataFrame(np.random.randint(1,10,(4,3)),index=[["a","a","b","b"],[1,2,1,2]],columns=[["one","one","two"],["blue","red","blue"]])
df.index.name=["row-1","row-2"]
df.columns.name=["col-1","col-2"]
df
Out[39]:
one two
blue red blue
a 1 5 7 4
2 5 1 2
b 1 1 7 7
2 7 8 6
- 最外层的索引
df.loc["a"]
Out[42]:
one two
blue red blue
1 5 7 4
2 5 1 2
- 最外层以及最里层的索引
df.loc["a",1]
Out[43]:
one blue 5
red 7
two blue 4
Name: (a, 1), dtype: int32
- 多级索引的排序
一级索引的排序
df.sortlevel(0)
Out[48]:
one two
blue red blue
a 1 5 7 4
2 5 1 2
b 1 1 7 7
2 7 8 6
二级索引的排序
df.sortlevel(1)
Out[49]:
one two
blue red blue
a 1 5 7 4
b 1 1 7 7
a 2 5 1 2
b 2 7 8 6
- 多级索引的计算
求和
#一级索引求和
df.sum(level=0)
Out[50]:
one two
blue red blue
a 10 8 6
b 8 15 13
#二级索引求和
df.sum(level=1)
Out[51]:
one two
blue red blue
1 6 14 11
2 12 9 8
4.2DataFrame简单索引
df=pd.DataFrame({
"a":range(7),
"b":range(7,0,-1),
"c":["one","one","one","two","two","two","two"],
"d":[0,1,2,0,1,2,3]
})
df
Out[54]:
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3
- 对指定列设置为一级索引列
df.set_index("c")
Out[55]:
a b d
c
one 0 7 0
one 1 6 1
one 2 5 2
two 3 4 0
two 4 3 1
two 5 2 2
two 6 1 3
- 对指定列设置为二级索引列
df.set_index(["c","d"])
Out[57]:
a b
c d
one 0 0 7
1 1 6
2 2 5
two 0 3 4
1 4 3
2 5 2
3 6 1
- 返回之前的索引维度
df2.reset_index()
Out[59]:
c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 two 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1
df2.reset_index().sort_index("columns")
Out[62]:
a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3