pandas-索引

1.Series索引的基础复习
2.DataFrame索引的基础复习
3.Series重复索引
4.DataFrame索引
最近更新:2018-01-30

1.Series索引的基础复习

import pandas as pd
import numpy as np
s=pd.Series(np.random.rand(5),index=list("abcde"))

s
Out[4]: 
a    0.492619
b    0.986350
c    0.330461
d    0.141790
e    0.922023
dtype: float64

s.index
Out[5]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

s.index.name="alpha"
s
Out[7]: 
alpha
a    0.492619
b    0.986350
c    0.330461
d    0.141790
e    0.922023
dtype: float64

2.DataFrame索引的基础复习

import pandas as pd
import numpy as np
df=pd.DataFrame(np.random.randn(4,3),columns=["one","two","three"])
df
Out[9]: 
        one       two     three
0 -0.000841 -0.707295 -0.185961
1  0.715077 -0.284825 -0.331871
2 -0.685716  0.399389  1.529932
3  1.277260 -0.124277  1.462555

df.index
Out[10]: RangeIndex(start=0, stop=4, step=1)

df.columns
Out[11]: Index(['one', 'two', 'three'], dtype='object')

df.columns.name="col"
df.index.name="row"
df
Out[14]: 
col       one       two     three
row                              
0   -0.000841 -0.707295 -0.185961
1    0.715077 -0.284825 -0.331871
2   -0.685716  0.399389  1.529932
3    1.277260 -0.124277  1.462555

3.Series重复索引

s=pd.Series(np.arange(6),index=list("abcbda"))
s
Out[16]: 
a    0
b    1
c    2
b    3
d    4
a    5
dtype: int32

3.1Series重复的索引

重复的索引返回的是一个series

s["a"]
Out[17]: 
a    0
a    5
dtype: int32

3.2Series不重复的索引

不重复的索引返回的是一个数据

s["c"]
Out[18]: 2

3.3Series判断是否为重复索引的方法

s.index.is_unique
Out[19]: False

3.4Series返回唯一索引的方法

s.index.unique
Out[20]: <bound method Index.unique of Index(['a', 'b', 'c', 'b', 'd', 'a'], dtype='object')>

3.5Series对索引进行分组,并对值运算

  • 求和,例如a=0+5=5,b=1+3=4
s.groupby(s.index).sum()
Out[21]: 
a    5
b    4
c    2
d    4
dtype: int32
  • 求平均值
s.groupby(s.index).mean()
Out[22]: 
a    2.5
b    2.0
c    2.0
d    4.0
dtype: float64

3.6Series对多维数据进行索引

a=[["a","a","a","b","b","c","c"],[1,2,3,1,2,2,3]]
t=list(zip(*a))
t
Out[25]: [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('c', 2), ('c', 3)]

index=pd.MultiIndex.from_tuples(t,names=["level1","level2"])
index
Out[29]: 
MultiIndex(levels=[['a', 'b', 'c'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 2, 2], [0, 1, 2, 0, 1, 1, 2]],
           names=['level1', 'level2'])

s=pd.Series(np.random.rand(7),index=index)
s
Out[31]: 
level1  level2
a       1         0.225854
        2         0.599906
        3         0.794228
b       1         0.715520
        2         0.114191
c       2         0.796958
        3         0.712853
dtype: float64
  • 对最外层的索引
s["a"]
Out[32]: 
level2
1    0.225854
2    0.599906
3    0.794228
dtype: float64

s["b":"c"]
Out[33]: 
level1  level2
b       1         0.715520
        2         0.114191
c       2         0.796958
        3         0.712853
dtype: float64

s[["a","c"]]
Out[34]: 
level1  level2
a       1         0.225854
        2         0.599906
        3         0.794228
c       2         0.796958
        3         0.712853
dtype: float64
  • 最内层的索引
s[:,2]
Out[35]: 
level1
a    0.599906
b    0.114191
c    0.796958
dtype: float64

4.DataFrame索引

4.1DataFrame对多维数据进行索引

df=pd.DataFrame(np.random.randint(1,10,(4,3)),index=[["a","a","b","b"],[1,2,1,2]],columns=[["one","one","two"],["blue","red","blue"]])

df.index.name=["row-1","row-2"]

df.columns.name=["col-1","col-2"]

df
Out[39]: 
     one      two
    blue red blue
a 1    5   7    4
  2    5   1    2
b 1    1   7    7
  2    7   8    6
  • 最外层的索引
df.loc["a"]
Out[42]: 
   one      two
  blue red blue
1    5   7    4
2    5   1    2
  • 最外层以及最里层的索引
df.loc["a",1]
Out[43]: 
one  blue    5
     red     7
two  blue    4
Name: (a, 1), dtype: int32
  • 多级索引的排序
    一级索引的排序
df.sortlevel(0)

Out[48]: 
     one      two
    blue red blue
a 1    5   7    4
  2    5   1    2
b 1    1   7    7
  2    7   8    6

二级索引的排序

df.sortlevel(1)
Out[49]: 
     one      two
    blue red blue
a 1    5   7    4
b 1    1   7    7
a 2    5   1    2
b 2    7   8    6
  • 多级索引的计算
    求和
#一级索引求和
df.sum(level=0)
Out[50]: 
   one      two
  blue red blue
a   10   8    6
b    8  15   13

#二级索引求和
df.sum(level=1)
Out[51]: 
   one      two
  blue red blue
1    6  14   11
2   12   9    8

4.2DataFrame简单索引

df=pd.DataFrame({
        "a":range(7),
        "b":range(7,0,-1),
        "c":["one","one","one","two","two","two","two"],
        "d":[0,1,2,0,1,2,3]
        })
        

df
Out[54]: 
   a  b    c  d
0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
3  3  4  two  0
4  4  3  two  1
5  5  2  two  2
6  6  1  two  3
  • 对指定列设置为一级索引列
df.set_index("c")
Out[55]: 
     a  b  d
c           
one  0  7  0
one  1  6  1
one  2  5  2
two  3  4  0
two  4  3  1
two  5  2  2
two  6  1  3
  • 对指定列设置为二级索引列
df.set_index(["c","d"])
Out[57]: 
       a  b
c   d      
one 0  0  7
    1  1  6
    2  2  5
two 0  3  4
    1  4  3
    2  5  2
    3  6  1
  • 返回之前的索引维度

df2.reset_index()
Out[59]: 
     c  d  a  b
0  one  0  0  7
1  one  1  1  6
2  one  2  2  5
3  two  0  3  4
4  two  1  4  3
5  two  2  5  2
6  two  3  6  1

df2.reset_index().sort_index("columns")
Out[62]: 
   a  b    c  d
0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
3  3  4  two  0
4  4  3  two  1
5  5  2  two  2
6  6  1  two  3
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容