pandas DataFrame 集合操作防懵逼指南

image.png

对于 pandas 的DataFrame 的集合操作我总是有点晕，主要是在 scala 或者 java 中很少有这种操作，可能最不擅长的就是二维数组的操作
另外 pandas 的操作真的很方便神奇，

先初始化一个 Dataframe

%python
import pandas as pd
import numpy as np
df2 = pd.DataFrame({ 'A' : [1,2,3,4],'B' : pd.Timestamp('20130102'),'C' : pd.Series(1,index=list(range(4)),dtype='float32'),'D' : np.array([3] * 4,dtype='int32'), 'E' : pd.Categorical(["test","train","test","train"]),'F' : 'foo' })

查看生成的 DataFrame

%python
df2

得到输出

out--
   A          B    C  D      E    F
0  1 2013-01-02  1.0  3   test  foo
1  2 2013-01-02  1.0  3  train  foo
2  3 2013-01-02  1.0  3   test  foo
3  4 2013-01-02  1.0  3  train  foo

下面是获取单列

%python
df2['E']

结果是

--out---
0     test
1    train
2     test
3    train
Name: E, dtype: category
Categories (2, object): [test, train]

33
下面是判断对应列是否为真的索引行布尔值

%python
df2['E']=="train"

结果是

---out---
0    False
1     True
2    False
3     True
Name: E, dtype: bool

333
布尔值为真的索引输出对应行

%python
df2[df2['E']=="train"]

结果是

---out---
   A          B    C  D      E    F
1  2 2013-01-02  1.0  3  train  foo
3  4 2013-01-02  1.0  3  train  foo

333
布尔值为真的索引行输出对应列的值

%python
df2[df2['E']=="train"]["D"]

结果是

--out--
1    3
3    3
Name: D, dtype: int32

甚至你可以多个Dataframe 联合操作
新建一个 df3 与 df2 稍有不同，

%python
df3 = pd.DataFrame({ 'A' : [1,2,3,4],'B' : pd.Timestamp('20130102'),'C' : pd.Series(1,index=list(range(4)),dtype='float32'),'D' : np.array([3] * 4,dtype='int32'), 'E' : pd.Categorical(["test","train","fa","train"]),'F' : 'zoo' })
df3

--out---
   A          B    C  D      E    F
0  1 2013-01-02  1.0  3   test  zoo
1  2 2013-01-02  1.0  3  train  zoo
2  3 2013-01-02  1.0  3     fa  zoo
3  4 2013-01-02  1.0  3  train  zoo
```

然后以 DF3 的对应行索引 去修改DF2 对应行 对应列的值
```
%python
df2["C"][df3["E"]=="fa"]=3.0
df2
```

---out---
A B C D E F
0 1 2013-01-02 1.0 3 test foo
1 2 2013-01-02 1.0 3 train foo
2 3 2013-01-02 3.0 3 test foo
3 4 2013-01-02 1.0 3 train foo

我们发现 C 列 第三行的值从   1.0    被 修改为列   3.0,我们相当与把
[df3["E"]=="fa"]  作为  df2的 行号 行索引 来使用
假如 从  面向对象语言学pandas 不了解这些的话，很容易就晕头转向列