最近在训练一个多位数字手写体的模型,然后发现,我用ModelCheckpoint 保存了训练过程中的结果最好一轮的参数。后续用模型来预测新样本的时候,就从直接本地加载训练的模型,代码如下:
def build_model():
input_ = Input(shape=(64, 64, 1))
#block1
model = Conv2D(64, (5, 5), activation ='relu', padding='same')(input_)
model = Conv2D(64, (5, 5), activation ='relu', padding='same')(model)
model = MaxPool2D(pool_size=(2, 2))(model)
#block2
model = Conv2D(128, (5, 5), activation ='relu', padding='same')(model)
model = Conv2D(128, (5, 5), activation ='relu', padding='same')(model)
model = MaxPool2D(pool_size=(2, 2))(model)
#block3
model = Conv2D(256, (5, 5), activation ='relu', padding='same')(model)
model = Conv2D(256, (5, 5), activation ='relu', padding='same')(model)
model = Conv2D(256, (5, 5), activation ='relu', padding='same')(model)
model = MaxPool2D(pool_size=(2, 2))(model)
# #block4
# model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
# model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
# model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
# model = MaxPool2D(pool_size=(2, 2))(model)
# #block5
# model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
# model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
# model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
# model = MaxPool2D(pool_size=(2, 2))(model)
# fully connected layer
model = Flatten()(model)
model = Dense(1024, activation='relu')(model)
model = Dropout(0.6)(model)
x1 = Dense(11, activation='softmax')(model)
x2 = Dense(11, activation='softmax')(model)
x = [x1, x2]
model = Model(inputs=input_, outputs=x)
return model
model = build_model()
model.load_weights('best_weights.h5',by_name=True)
我发现,本来训练的模型测试集和验证集效果都很好,但是预测的时候效果特别差,而且每次效果都不一样,我就觉得我训练的参数模型没有用上,模型应该是随机参数,所以才导致预测效果差,而且每次都不一样。想了一想,觉得问题只可能出在加载模型model.load_weights('best_weights.h5',by_name=True)
上面,所以研究了by_name这个参数。
下面是官方说明:
就是说
by_name=False
的时候按照网络的拓扑结构加载权重,by_name=True
的时候就是按照网络层名称进行加载.我上面的层没有取名字,所以by_name=True
的时候估计就是找不到层,就直接给我随机初始化,才会导致结果很差而且每一次预测结果都不一样。所以改过来之后预测结果就稳定而且还不错。