如何改变Series和DataFrame对象?
这里指的是如何改变上述数据类型的结构(加减或重排)
重新索引
.reindex()可以改变或重排Series和DataFrame索引
import pandas as pd
dl = {'城市': ['beijing','shanghai','guangzhou','shengzheng','shengyang'],
'环比': [101.5 , 101.2 , 101.3 , 102.0 , 100.1],
'同比': [120.7 , 127.3 , 119.4 , 140.9 , 101.4],
'定基': [121.4 , 127.8 , 120.0 , 145.5 , 101.6 ]}
d = pd.DataFrame(dl,index=["c1","c2","c3","c4","c5"])
d
Out[19]:
城市 环比 同比 定基
c5 shengyang 100.1 101.4 101.6
c4 shengzheng 102.0 140.9 145.5
c3 guangzhou 101.3 119.4 120.0
c2 shanghai 101.2 127.3 127.8
c1 beijing 101.5 120.7 121.4
##重排reindex
d = d.reindex(index=["c5","c4",'c3','c2','c1'])
d
Out[21]:
城市 环比 同比 定基
c5 shengyang 100.1 101.4 101.6
c4 shengzheng 102.0 140.9 145.5
c3 guangzhou 101.3 119.4 120.0
c2 shanghai 101.2 127.3 127.8
c1 beijing 101.5 120.7 121.4
d = d.reindex(columns=["城市","同比",'环比','定基'])
d
Out[23]:
城市 同比 环比 定基
c5 shengyang 101.4 100.1 101.6
c4 shengzheng 140.9 102.0 145.5
c3 guangzhou 119.4 101.3 120.0
c2 shanghai 127.3 101.2 127.8
c1 beijing 120.7 101.5 121.4
d = d.reindex(columns=["同比",'环比','定基'])
d
Out[26]:
同比 环比 定基
c5 101.4 100.1 101.6
c4 140.9 102.0 145.5
c3 119.4 101.3 120.0
c2 127.3 101.2 127.8
c1 120.7 101.5 121.4
.reindex具体参数如下
image.png
以下实例可以给dataframe添加新列
newc = d.columns.insert(4,'新增')
newd = d.reindex(columns=newc, fill_value=200)
newd
Out[27]:
同比 环比 定基 新增
c5 101.4 100.1 101.6 200
c4 140.9 102.0 145.5 200
c3 119.4 101.3 120.0 200
c2 127.3 101.2 127.8 200
c1 120.7 101.5 121.4 200
method参数用法
import pandas as pd
import numpy as np
frame=pd.DataFrame(np.arange(9).reshape((3,3)),index=['a','c','d'],columns=['Ohio','Texas','California'])
frame
Out[50]:
Ohio Texas California
a 0 1 2
c 3 4 5
d 6 7 8
frame.reindex(['a','b','c','d'],method='ffill')
Out[46]:
Ohio Texas California
a 0 1 2
b 0 1 2
c 3 4 5
d 6 7 8
frame.reindex(['a','b','c','d'],method='bfill')
Out[47]:
Ohio Texas California
a 0 1 2
b 3 4 5
c 3 4 5
d 6 7 8
frame.reindex(['a','b','c','d'],method='pad')
Out[48]:
Ohio Texas California
a 0 1 2
b 0 1 2
c 3 4 5
d 6 7 8
frame.reindex(['a','b','c','d'],method='backfill')
Out[49]:
Ohio Texas California
a 0 1 2
b 3 4 5
c 3 4 5
d 6 7 8
#可以看出,ffill是向前填充,bfill是向后填充,pad和backfill结果和ffill bfill相同。
索引类型
image.png
索引类型的常用方法
image.png
nc = d.columns.delete(2)
ni = d.index.insert(5,"c0")
nd = d.reindex(index=ni,columns=nc).ffill()
nd
Out[76]:
城市 环比 定基
c1 beijing 101.5 121.4
c2 shanghai 101.2 127.8
c3 guangzhou 101.3 120.0
c4 shengzheng 102.0 145.5
c5 shengyang 100.1 101.6
c0 shengyang 100.1 101.6
a = pd.Series([9,8,7,6],index=["a","b","c","d"])
a
删除指定索引对象
.drop() 能够删除Series和dataframe指定行或列的索引
a = pd.Series([9,8,7,6],index=["a","b","c","d"])
a
Out[79]:
a 9
b 8
c 7
d 6
dtype: int64
a.drop(["b","c"])
Out[80]:
a 9
d 6
dtype: int64
d
Out[88]:
城市 环比 同比 定基
c1 beijing 101.5 120.7 121.4
c2 shanghai 101.2 127.3 127.8
c3 guangzhou 101.3 119.4 120.0
c4 shengzheng 102.0 140.9 145.5
c5 shengyang 100.1 101.4 101.6
d.drop("c5")
Out[89]:
城市 环比 同比 定基
c1 beijing 101.5 120.7 121.4
c2 shanghai 101.2 127.3 127.8
c3 guangzhou 101.3 119.4 120.0
c4 shengzheng 102.0 140.9 145.5
##如果要删除列,那么要加上axis = 1,默认axis=0)
d.drop("同比",axis=1)
Out[90]:
城市 环比 定基
c1 beijing 101.5 121.4
c2 shanghai 101.2 127.8
c3 guangzhou 101.3 120.0
c4 shengzheng 102.0 145.5
c5 shengyang 100.1 101.6
数据类型的算术运算
a = pd.DataFrame(np.arange(12).reshape(3,4))
a
Out[96]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
b = pd.DataFrame(np.arange(20).reshape(4,5))
b
Out[97]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
a+b
Out[98]:
0 1 2 3 4
0 0.0 2.0 4.0 6.0 NaN
1 9.0 11.0 13.0 15.0 NaN
2 18.0 20.0 22.0 24.0 NaN
3 NaN NaN NaN NaN NaN
a*b
Out[99]:
0 1 2 3 4
0 0.0 1.0 4.0 9.0 NaN
1 20.0 30.0 42.0 56.0 NaN
2 80.0 99.0 120.0 143.0 NaN
3 NaN NaN NaN NaN NaN
#自动补齐,缺项用NaN补齐
还可以用函数表示加减乘除的运算
这样可以给定可选参数
image.png
##函数 add sub mul div
b.add(a,fill_value=100)
Out[100]:
0 1 2 3 4
0 0.0 2.0 4.0 6.0 104.0
1 9.0 11.0 13.0 15.0 109.0
2 18.0 20.0 22.0 24.0 114.0
3 115.0 116.0 117.0 118.0 119.0
a.mul(b,fill_value=0)
Out[101]:
0 1 2 3 4
0 0.0 1.0 4.0 9.0 0.0
1 20.0 30.0 42.0 56.0 0.0
2 80.0 99.0 120.0 143.0 0.0
3 0.0 0.0 0.0 0.0 0.0
先补齐再运算,full_value可以指定补齐参数
一维和多维之间的运算
b
Out[102]:
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
2 10 11 12 13 14
3 15 16 17 18 19
c = pd.Series(np.arange(4))
c
Out[104]:
0 0
1 1
2 2
3 3
dtype: int32
c-10
Out[105]:
0 -10
1 -9
2 -8
3 -7
dtype: int32
b-c
Out[106]:
0 1 2 3 4
0 0.0 0.0 0.0 0.0 NaN
1 5.0 5.0 5.0 5.0 NaN
2 10.0 10.0 10.0 10.0 NaN
3 15.0 15.0 15.0 15.0 NaN
不同维度间为广播运算,即低维度会与高维度对应位置的每个值进行运算,一维Series默认在轴1参与运算,即Series与每一行进行运算;如果希望运算发生在0轴上,则需要制定axis=0,如:
b.sub(c,axis=1)
Out[107]:
0 1 2 3 4
0 0.0 0.0 0.0 0.0 NaN
1 5.0 5.0 5.0 5.0 NaN
2 10.0 10.0 10.0 10.0 NaN
3 15.0 15.0 15.0 15.0 NaN
##Series与每一列进行运算
使用运算方法可以让一位Series参与轴0运算
比较运算
##比较运算法则
'''
比较运算之恶能比较相同索引的元素,不进行补齐。
二维和一维,一维和零维间的比较为广播运算
采用><>=<===!=等符号进行的二元运算产生布尔值
'''
##同维度运算,需要尺寸一致
a
Out[117]:
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
d = pd.DataFrame(np.arange(12,0,-1).reshape(3,4))
d
Out[114]:
0 1 2 3
0 12 11 10 9
1 8 7 6 5
2 4 3 2 1
a > d
Out[115]:
0 1 2 3
0 False False False False
1 False False False True
2 True True True True
a ==d
Out[116]:
0 1 2 3
0 False False False False
1 False False True False
2 False False False False
#不同维度运算,广播运算,默认在1轴,即每一行进行比较。
a > c
Out[118]:
0 1 2 3
0 False False False False
1 True True True True
2 True True True True