Keras 实例教程（三）- 使用VGG-16识别

Keras 作为当前深度学习框架中的热门之一，使用起来是极其简便的，它所提供的各种友好而灵活的API，即使对于新手而言，相比于TensorFlow也非常容易上手。更特别的是，Keras中还预置了多种已经训练好的、非常流行的神经网络模型:

Model	Size	Top-1 Accuracy	Top-5 Accuracy	Parameters	Depth
Xception	88 MB	0.790	0.945	22,910,480	126
VGG16	528MB	0.713	0.901	138,357,544	23
VGG19	549 MB	0.713	0.900	143,667,240	26
ResNet50	99 MB	0.749	0.921	25,636,712	168
InceptionV3	92 MB	0.779	0.937	23,851,784	159
InceptionResNetV2	215 MB	0.803	0.953	55,873,736	572
MobileNet	16 MB	0.704	0.895	4,253,864	88
MobileNetV2	14 MB	0.713	0.901	3,538,984	88
DenseNet121	33 MB	0.750	0.923	8,062,504	121
DenseNet169	57 MB	0.762	0.932	14,307,880	169
DenseNet201	80 MB	0.773	0.936	20,242,984	201
NASNetMobile	23 MB	0.744	0.919	5,326,716	-
NASNetLarge	343 MB	0.825	0.960	88,949,818	-

VGG 结构简介

使用者可以非常方便地以他山之石来解决自己的问题。本文将以VGG16为例来演示，如何在Keras中执行物体识别（Object Recognization）任务。VGG16是由来自牛津大学的研究团队涉及并实现的一个基于CNN的深度学习网络，它的深度为23（包括16个layers），所有的权重总计超过500M，下图给出了它的一个基本结构（参考D列）：

image

通过下图可以更加清晰了解：

image

简单概括其结构为：
VGG-16，输入层224x224x3，经过两层相同的卷积，卷积filter为3*3，stride为1，filter数为64，然后经过一层pooling。接着按照相同的方式，让宽和高越来越小，而通道数逐倍增加，直到512。最后用两层相同全连接加一个softmax。使用流程图即为：

image

这里有更加清楚的VGG结构图。

VGG-16使用

可以使用下面的命令直接导入已经训练好的VGG16网络，注意因为全部的参数总计超过500M，因此当你首次使用下面的命令时，Keras需要从网上先下载这些参数，这可能需要耗用一些时间。

from keras.applications.vgg16 import VGG16
model = VGG16()
print(model.summary())

最后一句会输入VGG16网络的层级结构，不仅如此，VGG()这个类中还提供了一些参数，这些参数可以令你非常方便地定制个性化的网络结构，这一点在迁移学习（Transfer Learning）中尤其有用，摘列部分参数如下：

include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
input_shape (None): The size of images that the model is expected to take if you change the input layer.
pooling (None): The type of pooling to use when you are training a new set of output layers.
classes (1000): The number of classes (e.g. size of output vector) for the model.

当你需要直接使用VGG-16输出识别结果时，需要enable include_top来包含output layer。

加载图片及处理

准确好一张待识别的图片，其内容为一只金毛犬（golden_retriever）：

image

from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions

from keras.preprocessing.image import load_img, img_to_array
import numpy as np

image = load_img('C:/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)

# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)

# prepare the image data for VGG
image_data = preprocess_input(image_data)

注意点：

输入图片的dim是224x224；
需要reshape为（samples,dims）,即dim为（224,224,3）的若干输入样本；
最后还需要preprocess_input()是将其转化为VGG-16能够接受的输入，实际上为每个像素减去均值（见原文描述）：

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

进行预测和解析

# using the pre-trained model to predict
prediction = model.predict(image_data)

# decode the prediction results
results = decode_predictions(prediction, top=3)

print(results)

我们将得到可能性最高的前三个识别结果：

[[('n02099601', 'golden_retriever', 0.9698627), ('n04409515', 'tennis_ball', 0.008626293), ('n02100877', 'Irish_setter', 0.004562445)]]

可见与结果试一致的，97%预测是金毛。

完整代码：

from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions

from keras.preprocessing.image import load_img, img_to_array
import numpy as np
# VGG-16 instance
model = VGG16(weights='imagenet', include_top=True)

image = load_img('C:/Pictures/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)

# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)

# prepare the image data for VGG
image_data = preprocess_input(image_data)

# using the pre-trained model to predict
prediction = model.predict(image_data)

# decode the prediction results
results = decode_predictions(prediction, top=3)

print(results)