pandas NaN处理

赋值为NaN

t = pd.DataFrame(np.arange(12).reshape(3,4))
#赋值为NaN
t.loc[:1,2:] = np.nan
print(t)
>>>
   0  1     2     3
0  0  1   NaN   NaN
1  4  5   NaN   NaN
2  8  9  10.0  11.0

判断是不是NaN

#判断哪些数据是NaN，方式1
print(pd.isnull(t))#True表示NaN
>>>
       0      1      2      3
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False

#判断哪些数据是NaN，方式2
print(pd.notnull(t))#False表示NaN
>>>
      0     1      2      3
0  True  True  False  False
1  True  True  False  False
2  True  True   True   True

#获取第2列中不为NaN的数据
print(pd.notnull(t[2]))
>>>
0    False
1    False
2     True
Name: 2, dtype: bool

print(type(pd.notnull(t[2])))
>>>
<class 'pandas.core.series.Series'>

print(t[pd.notnull(t[2])])
>>>
   0  1     2     3
2  8  9  10.0  11.0

删除NaN

#axis=0表示删除行，这里是删除有NaN数据所在的行
print(t.dropna(axis=0))#t.dropna(axis=0,how="any")等价，只要行有NaN就删除该行
>>>
   0  1     2     3
2  8  9  10.0  11.0

print(t.dropna(axis=0,subset=[2]))#删除第2列NaN数据所在的行，和print(t[pd.notnull(t[2])])效果一样

print(t.dropna(axis=0,how="all"))#如果需要直接修改t本身，加参数inplace=True即可。该行全部都是NaN才删除
>>>
   0  1     2     3
0  0  1   NaN   NaN
1  4  5   NaN   NaN
2  8  9  10.0  11.0

填充NaN

print(t.fillna(100))#使用数字100填充NaN
>>>
   0  1      2      3
0  0  1  100.0  100.0
1  4  5  100.0  100.0
2  8  9   10.0   11.0

print(t.fillna(t.mean()))#使用NaN所在列的均值填充。计算均值时，NaN数据不计数
>>>
   0  1     2     3
0  0  1  10.0  11.0
1  4  5  10.0  11.0
2  8  9  10.0  11.0

#用200填充第二列的NaN
tmp = t[2].fillna(200)#Series类型
print(tmp)
>>>
0    200.0
1    200.0
2     10.0
Name: 2, dtype: float64

t[2] = tmp
print(t)
>>>
   0  1      2     3
0  0  1  200.0   NaN
1  4  5  200.0   NaN
2  8  9   10.0  11.0

pandas NaN处理

赋值为NaN

判断是不是NaN

删除NaN

填充NaN

推荐阅读更多精彩内容