一、重采样
1、欠采样
欠采样就是一个随机删除一部分多数类(数量多的类型)数据的过程
# Shuffle the Dataset. 进行一个数据集打乱的操作
shuffled_df = credit_df.sample(frac=1,random_state=4)
# Put all the fraud class in a separate dataset. 欺诈类 就是数量少的类
fraud_df = shuffled_df.loc[shuffled_df['Class'] == 1]
#Randomly select 492 observations from the non-fraud (majority class) 选取过多的类进行一个抽取
non_fraud_df=shuffled_df.loc[shuffled_df['Class']== 0].sample(n=492,random_state=42)
# Concatenate both dataframes again 生成一个均衡类
normalized_df = pd.concat([fraud_df, non_fraud_df])
#plot the dataset after the undersampling 下面是画图操作
plt.figure(figsize=(8, 8))
sns.countplot('Class', data=normalized_df)
plt.title('Balanced Classes')
plt.show()