1. 导入numpy包
import numpy as np
2. 构造样本数据
构造50个人的语文与数学的成绩,平均成绩为65分,标准差为20
np.random.seed(1)
data = np.random.normal(65,20, [25,2]).round(1)
print(data)
[[ 97.5 52.8]
[ 54.4 43.5]
[ 82.3 19. ]
[ 99.9 49.8]
[ 71.4 60. ]
[ 94.2 23.8]
[ 58.6 57.3]
[ 87.7 43. ]
[ 61.6 47.4]
[ 65.8 76.7]
[ 43. 87.9]
[ 83. 75. ]
[ 83. 51.3]
[ 62.5 46.3]
[ 59.6 75.6]
[ 51.2 57.1]
[ 51.3 48.1]
[ 51.6 64.7]
[ 42.7 69.7]
[ 98.2 79.8]
[ 61.2 47.2]
[ 50.1 98.8]
[ 66. 52.3]
[ 68.8 107. ]
[ 67.4 77.3]]
3. 查看异常数据
- 低于30或大于100的数据
data[(data>100) | (data<30)]
array([ 19. , 23.8, 107. ])
4.替换异常数据
- 将小于30的替换为30,大于100的替换为100
clip函数:用于限制数组的值,比如所有的数据都应处于[0,1]之间,小于0的数设置为0,大于1的值设置为1.
data1 = np.clip(data, a_min=30, a_max=100)
data1[(data1<30) | (data1>100)]
array([], dtype=float64)