该篇文章探索的是用pytorch搭建的模型是否出现过拟合、欠拟合、数据问题。
怎么知道知道自己的模型是过拟合,欠拟合,数据问题?
1)学习曲线(learning curves)
from sklearn.model_selection import learning_curve
2)交叉验证(corss-validation)
from sklearn.model_selection import ShuffleSplit
cv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)
3)我们可以先通过训练集和测试集准确率的大小,直观的判断模型是否过拟合;当没有把握断定模型是否过拟合时,再借助学习曲线。
详细介绍请查看此文章
一份可运行的学习曲线(learning curves) 1 2 3
import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import learning_curve
from sklearn.model_selection import ShuffleSplit
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=None,
n_jobs=1, train_sizes=np.linspace(.1,1.0,5)):
train_size, train_scores, test_scores = learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()
plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.1,
color="r")
plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.1, color="g")
plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
label="Training score")
plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
label="Cross-validation score")
plt.legend(loc="best")
return plt
digits = load_digits()
X, y = digits.data, digits.target
title = "Learning Curves (GaussianNB)"
cv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)
estimator = GaussianNB()
plot_learning_curve(estimator, title, X, y, ylim=(0.7, 1.01), cv=cv, n_jobs=4)
plt.show()
过拟合是模型对训练集数据拟合能力太强,甚至将训练数据中的noise都学习进去了,造成了在测试集上预测能力差的情况。
出现过拟合的原因
·训练数据量级小于模型的复杂度;
·训练集和测试集特征分布不一致;
·样本里的噪声数据过大,大到模型过分记住了噪声特征,反而忽略了真实的输入输出的关系;
·权值学习迭代次数足够多(overtraining)
过拟合,克服思路
1·利用dropout
2·利用L2/L1 regularization
torch.optim集成了很多优化器,如SGD,Adadelta,Adam,Adagrad,RMSprop等,这些优化器中有一个参数weight_decay,用于指定权值衰减率,相当于L2正则化中的λ参数。L2正则化:
optimizer = torch.optim.Adam(model.parameters(),lr=0.001,weight_decay=0.01)
缺点:torch.optim的优化器只能实现L2正则化,不能实现L1正则化。
3·调小batch_size
4·搜集更多数据
5·对神经元归一化BatchNorm
pytorch中BatchNorm有BatchNorm1d、BatchNorm2d、BatchNorm3d三种,根据具体数据选择不同的BatchNorm,BatchNorm层的使用与普通的层使用方法类似。
bn0 = torch.nn.BatchNorm1d(d1)
bn1 = torch.nn.BatchNorm1d(d1)
参考文章:
【1】sklearn模型调优(判断是否过拟合及选择参数)
【2】过拟合(出现的原因4种、解决方案6种)
【3】深度学习过拟合解决方案(pytorch相关方案实现)
【4】欠拟合、过拟合及其解决方法