Keras 作为当前深度学习框架中的热门之一,使用起来是极其简便的,它所提供的各种友好而灵活的API,即使对于新手而言,相比于TensorFlow也非常容易上手。更特别的是,Keras中还预置了多种已经训练好的、非常流行的神经网络模型:
Model | Size | Top-1 Accuracy | Top-5 Accuracy | Parameters | Depth |
---|---|---|---|---|---|
Xception | 88 MB | 0.790 | 0.945 | 22,910,480 | 126 |
VGG16 | 528MB | 0.713 | 0.901 | 138,357,544 | 23 |
VGG19 | 549 MB | 0.713 | 0.900 | 143,667,240 | 26 |
ResNet50 | 99 MB | 0.749 | 0.921 | 25,636,712 | 168 |
InceptionV3 | 92 MB | 0.779 | 0.937 | 23,851,784 | 159 |
InceptionResNetV2 | 215 MB | 0.803 | 0.953 | 55,873,736 | 572 |
MobileNet | 16 MB | 0.704 | 0.895 | 4,253,864 | 88 |
MobileNetV2 | 14 MB | 0.713 | 0.901 | 3,538,984 | 88 |
DenseNet121 | 33 MB | 0.750 | 0.923 | 8,062,504 | 121 |
DenseNet169 | 57 MB | 0.762 | 0.932 | 14,307,880 | 169 |
DenseNet201 | 80 MB | 0.773 | 0.936 | 20,242,984 | 201 |
NASNetMobile | 23 MB | 0.744 | 0.919 | 5,326,716 | - |
NASNetLarge | 343 MB | 0.825 | 0.960 | 88,949,818 | - |
VGG 结构简介
使用者可以非常方便地以他山之石来解决自己的问题。本文将以VGG16为例来演示,如何在Keras中执行物体识别(Object Recognization)任务。VGG16是由来自牛津大学的研究团队涉及并实现的一个基于CNN的深度学习网络,它的深度为23(包括16个layers),所有的权重总计超过500M,下图给出了它的一个基本结构(参考D列):
通过下图可以更加清晰了解:
简单概括其结构为:
VGG-16,输入层224x224x3,经过两层相同的卷积,卷积filter为3*3,stride为1,filter数为64,然后经过一层pooling。接着按照相同的方式,让宽和高越来越小,而通道数逐倍增加,直到512。最后用两层相同全连接加一个softmax。使用流程图即为:
这里有更加清楚的VGG结构图。
VGG-16使用
可以使用下面的命令直接导入已经训练好的VGG16网络,注意因为全部的参数总计超过500M,因此当你首次使用下面的命令时,Keras需要从网上先下载这些参数,这可能需要耗用一些时间。
from keras.applications.vgg16 import VGG16
model = VGG16()
print(model.summary())
最后一句会输入VGG16网络的层级结构,不仅如此,VGG()这个类中还提供了一些参数,这些参数可以令你非常方便地定制个性化的网络结构,这一点在迁移学习(Transfer Learning)中尤其有用,摘列部分参数如下:
- include_top (True): Whether or not to include the output layers for the model. You don’t need these if you are fitting the model on your own problem.
- weights (‘imagenet‘): What weights to load. You can specify None to not load pre-trained weights if you are interested in training the model yourself from scratch.
- input_tensor (None): A new input layer if you intend to fit the model on new data of a different size.
- input_shape (None): The size of images that the model is expected to take if you change the input layer.
- pooling (None): The type of pooling to use when you are training a new set of output layers.
- classes (1000): The number of classes (e.g. size of output vector) for the model.
当你需要直接使用VGG-16输出识别结果时,需要enable include_top来包含output layer。
加载图片及处理
准确好一张待识别的图片,其内容为一只金毛犬(golden_retriever):
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
image = load_img('C:/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)
# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)
# prepare the image data for VGG
image_data = preprocess_input(image_data)
注意点:
- 输入图片的dim是224x224;
- 需要reshape为(samples,dims),即dim为(224,224,3)的若干输入样本;
- 最后还需要preprocess_input()是将其转化为VGG-16能够接受的输入,实际上为每个像素减去均值(见原文描述):
The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.
进行预测和解析
# using the pre-trained model to predict
prediction = model.predict(image_data)
# decode the prediction results
results = decode_predictions(prediction, top=3)
print(results)
我们将得到可能性最高的前三个识别结果:
[[('n02099601', 'golden_retriever', 0.9698627), ('n04409515', 'tennis_ball', 0.008626293), ('n02100877', 'Irish_setter', 0.004562445)]]
可见与结果试一致的,97%预测是金毛。
完整代码:
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
# VGG-16 instance
model = VGG16(weights='imagenet', include_top=True)
image = load_img('C:/Pictures/Pictures/test_imgs/golden.jpg', target_size=(224, 224))
image_data = img_to_array(image)
# reshape it into the specific format
image_data = image_data.reshape((1,) + image_data.shape)
print(image_data.shape)
# prepare the image data for VGG
image_data = preprocess_input(image_data)
# using the pre-trained model to predict
prediction = model.predict(image_data)
# decode the prediction results
results = decode_predictions(prediction, top=3)
print(results)