keras model.load_weights 的小问题

最近在训练一个多位数字手写体的模型,然后发现,我用ModelCheckpoint 保存了训练过程中的结果最好一轮的参数。后续用模型来预测新样本的时候,就从直接本地加载训练的模型,代码如下:

def build_model():
    input_ = Input(shape=(64, 64, 1))
    
    #block1
    model = Conv2D(64, (5, 5), activation ='relu', padding='same')(input_)
    model = Conv2D(64, (5, 5), activation ='relu', padding='same')(model)
    model = MaxPool2D(pool_size=(2, 2))(model)
    #block2
    model = Conv2D(128, (5, 5), activation ='relu', padding='same')(model)
    model = Conv2D(128, (5, 5), activation ='relu', padding='same')(model)
    model = MaxPool2D(pool_size=(2, 2))(model)
    #block3
    model = Conv2D(256, (5, 5), activation ='relu', padding='same')(model)
    model = Conv2D(256, (5, 5), activation ='relu', padding='same')(model)
    model = Conv2D(256, (5, 5), activation ='relu', padding='same')(model)
    model = MaxPool2D(pool_size=(2, 2))(model)
#     #block4
#     model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
#     model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
#     model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
#     model = MaxPool2D(pool_size=(2, 2))(model)
#     #block5
#     model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
#     model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
#     model = Conv2D(512, (3, 3), activation ='relu', padding='same')(model)
#     model = MaxPool2D(pool_size=(2, 2))(model)
    
    # fully connected layer
    model = Flatten()(model)

    model = Dense(1024, activation='relu')(model)
    model = Dropout(0.6)(model)
    
    x1 = Dense(11, activation='softmax')(model)
    x2 = Dense(11, activation='softmax')(model)

    x = [x1, x2]

    model = Model(inputs=input_, outputs=x)
    return model

model = build_model()
model.load_weights('best_weights.h5',by_name=True)

我发现,本来训练的模型测试集和验证集效果都很好,但是预测的时候效果特别差,而且每次效果都不一样,我就觉得我训练的参数模型没有用上,模型应该是随机参数,所以才导致预测效果差,而且每次都不一样。想了一想,觉得问题只可能出在加载模型model.load_weights('best_weights.h5',by_name=True)上面,所以研究了by_name这个参数。
下面是官方说明:

image.png

image.png

就是说by_name=False 的时候按照网络的拓扑结构加载权重,by_name=True 的时候就是按照网络层名称进行加载.我上面的层没有取名字,所以by_name=True的时候估计就是找不到层,就直接给我随机初始化,才会导致结果很差而且每一次预测结果都不一样。所以改过来之后预测结果就稳定而且还不错。

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。