TensorFlow学习笔记：使用Inception v3进行图像分类

Inception

0. Google Inception模型简介

Inception为Google开源的CNN模型，至今已经公开四个版本，每一个版本都是基于大型图像数据库ImageNet中的数据训练而成。因此我们可以直接利用Google的Inception模型来实现图像分类。本篇文章主要以Inception_v3模型为基础。Inception v3模型大约有2500万个参数，分类一张图像就用了50亿的乘加指令。在一台没有GPU的现代PC上，分类一张图像转眼就能完成。

1. Google Inception模型发展

以下为Inception四个版本所对应的论文，末尾为ILSVRC中的Top-5错误率：

[v1] Going Deeper with Convolutions: 6.67% test error
[v2] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift: 4.8% test error
[v3] Rethinking the Inception Architecture for Computer Vision: 3.5% test error
[v4] Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning: 3.08% test error

2. 下载Inception_v3模型

Inception_v3模型源码下载
当然，要想自己从头训练一个Inception_v3模型是可以的，但费时费力，没有必要。当然，在已经训练好的Inception_v3模型上修修改改retrain是没有问题的，具体将在后续文中提到。
models/imagenet/classify_image.py
官方已经放出了例程，也可以直接阅读官方代码。
【下载】已训练好的Inception_v3模型（百度网盘）
【下载】已训练好的Inception_v3模型（官方）
以上两个链接都是可以使用的。

解压下载好的压缩文件，如下：

文件列表

classify_image_graph_def.pb文件为Inception_v3本体
imagenet_2012_challenge_label_map_proto.pbtxt文件内容如下所示：

imagenet_2012_challenge_label_map_proto.pbtxt

包含target_class与target_class_string，前者为分类代码，从1~1000，共1k类，记为Node_ID；后者为一编号字符串“n********”，可以理解为“地址”或者“桥梁”，记为UID。
imagenet_synset_to_human_label_map.txt文件内容如下：

imagenet_synset_to_human_label_map.txt

包含UID与类别的映射，这种类别文字标签记为human_string。

3. 准备工作

随便从网上下载一张图片，命名为husky.jpg：

husky.jpg

下面的代码就将使用Inception_v3模型对这张哈士奇图片进行分类。

4. 代码

先创建一个类NodeLookup来将softmax概率值映射到标签上；然后创建一个函数create_graph()来读取并新建模型；最后读取哈士奇图片进行分类识别：

# -*- coding: utf-8 -*-

import tensorflow as tf
import numpy as np
#import re
import os

model_dir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Inception_model'
image = 'C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Images/husky.jpg'


#将类别ID转换为人类易读的标签
class NodeLookup(object):
    def __init__(self, label_lookup_path=None, uid_lookup_path=None):
        if not label_lookup_path:
            # 加载“label_lookup_path”文件
            # 此文件将数据集中所含类别（1-1000）与一个叫做target_class_string的地址对应起来
            # 其地址编码为“n********”星号代表数字
            label_lookup_path = os.path.join(
                    model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
        if not uid_lookup_path:
            # 加载“uid_lookup_path”文件
            # 此文件将数据集中所含类别具体名称与编码方式为“n********”的地址/UID一一对应起来
            uid_lookup_path = os.path.join(
                    model_dir, 'imagenet_synset_to_human_label_map.txt')
        self.node_lookup = self.load(label_lookup_path, uid_lookup_path)

    def load(self, label_lookup_path, uid_lookup_path):
        if not tf.gfile.Exists(uid_lookup_path):
            # 预先检测地址是否存在
            tf.logging.fatal('File does not exist %s', uid_lookup_path)
        if not tf.gfile.Exists(label_lookup_path):
            # 预先检测地址是否存在
            tf.logging.fatal('File does not exist %s', label_lookup_path)


        # Loads mapping from string UID to human-readable string
        # 加载编号字符串n********，即UID与分类名称之间的映射关系（字典）：uid_to_human
        
        # 读取uid_lookup_path中所有的lines
        # readlines(): Returns all lines from the file in a list.
        # Leaves the '\n' at the end.
        proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
        
        # 创建空字典uid_to_human用以存储映射关系
        uid_to_human = {}
# =============================================================================
#         # 使用正则化方法处理文件：
#         p = re.compile(r'[n\d]*[ \S,]*')
#         for line in proto_as_ascii_lines:         
#              = p.findall(line)
#             uid = parsed_items[0]
#             human_string = parsed_items[2]
#             uid_to_human[uid] = human_string
# =============================================================================
        # 使用简单方法处理文件：
        # 一行行读取数据
        for line in proto_as_ascii_lines:
            # 去掉换行符
            line = line.strip('\n')
            # 按照‘\t’分割，即tab，将line分为两个部分
            parse_items = line.split('\t')
            # 获取分类编码，即UID
            uid = parse_items[0]
            # 获取分类名称
            human_string = parse_items[1]
            # 新建编号字符串n********，即UID与分类名称之间的映射关系（字典）：uid_to_human
            uid_to_human[uid] = human_string
            

        # Loads mapping from string UID to integer node ID.
        # 加载编号字符串n********，即UID与分类代号，即node ID之间的映射关系（字典）
        
        # 加载分类字符串n********，即UID对应分类编号1-1000的文件
        proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
        # 创建空字典node_id_to_uid用以存储分类代码node ID与UID之间的关系
        node_id_to_uid = {}
        for line in proto_as_ascii:
            # 注意空格
            if line.startswith('  target_class:'):
                # 获取分类编号
                target_class = int(line.split(': ')[1])
            if line.startswith('  target_class_string:'):
                # 获取UID（带双引号，eg："n01484850"）
                target_class_string = line.split(': ')[1]
                # 去掉前后的双引号，构建映射关系
                node_id_to_uid[target_class] = target_class_string[1:-2]
    
        # Loads the final mapping of integer node ID to human-readable string
        # 加载node ID与分类名称之间的映射关系
        node_id_to_name = {}
        for key, val in node_id_to_uid.items():
            # 假如uid不存在于uid_to_human中，则报错
            if val not in uid_to_human:
                tf.logging.fatal('Failed to locate: %s', val)
            # 获取分类名称
            name = uid_to_human[val]
            # 构建分类编号1-1000对应分类名称的映射关系：key为node_id；val为name
            node_id_to_name[key] = name
    
        return node_id_to_name

    # 传入分类编号1-1000，返回分类具体名称
    def id_to_string(self, node_id):
        # 若不存在，则返回空字符串
        if node_id not in self.node_lookup:
            return ''
        return self.node_lookup[node_id]

# 读取并创建一个图graph来存放Google训练好的Inception_v3模型（函数）
def create_graph():
    with tf.gfile.FastGFile(os.path.join(
            model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        tf.import_graph_def(graph_def, name='')

#读取图片
image_data = tf.gfile.FastGFile(image, 'rb').read()

#创建graph
create_graph()

# 创建会话，因为是从已有的Inception_v3模型中恢复，所以无需初始化
with tf.Session() as sess:
    # Inception_v3模型的最后一层softmax的输出
    # 形如'conv1'是节点名称，而'conv1:0'是张量名称，表示节点的第一个输出张量
    softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
    # 输入图像（jpg格式）数据，得到softmax概率值（一个shape=(1,1008)的向量）
    predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})
    # 将结果转为1维数据
    predictions = np.squeeze(predictions)
    # 新建类：ID --> English string label.
    node_lookup = NodeLookup()
    # 排序，取出前5个概率最大的值（top-5)
    # argsort()返回的是数组值从小到大排列所对应的索引值
    top_5 = predictions.argsort()[-5:][::-1]
    for node_id in top_5:
        # 获取分类名称
        human_string = node_lookup.id_to_string(node_id)
        # 获取该分类的置信度
        score = predictions[node_id]
        print('%s (score = %.5f)' % (human_string, score))

最后输出：

runfile('C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/test.py', wdir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3')
Siberian husky (score = 0.51033)
Eskimo dog, husky (score = 0.41048)
malamute, malemute, Alaskan malamute (score = 0.00653)
kelpie (score = 0.00136)
dogsled, dog sled, dog sleigh (score = 0.00133)

稍微修改一下代码，使输入为多张图片，输出为图片路径+图片+预测结果：

# -*- coding: utf-8 -*-
"""
Created on Fri Oct  6 19:32:04 2017
test2：将test中输入一张图片变为输入一个文件夹的图片，并使输出可见
@author: Dexter
"""

import tensorflow as tf
import numpy as np
#import re
import os
from PIL import Image
import matplotlib.pyplot as plt

model_dir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Inception_model'
image = 'C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/Images/'


#将类别ID转换为人类易读的标签
class NodeLookup(object):
    def __init__(self, label_lookup_path=None, uid_lookup_path=None):
        if not label_lookup_path:
            # 加载“label_lookup_path”文件
            # 此文件将数据集中所含类别（1-1000）与一个叫做target_class_string的地址对应起来
            # 其地址编码为“n********”星号代表数字
            label_lookup_path = os.path.join(
                    model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
        if not uid_lookup_path:
            # 加载“uid_lookup_path”文件
            # 此文件将数据集中所含类别具体名称与编码方式为“n********”的地址/UID一一对应起来
            uid_lookup_path = os.path.join(
                    model_dir, 'imagenet_synset_to_human_label_map.txt')
        self.node_lookup = self.load(label_lookup_path, uid_lookup_path)

    def load(self, label_lookup_path, uid_lookup_path):
        if not tf.gfile.Exists(uid_lookup_path):
            # 预先检测地址是否存在
            tf.logging.fatal('File does not exist %s', uid_lookup_path)
        if not tf.gfile.Exists(label_lookup_path):
            # 预先检测地址是否存在
            tf.logging.fatal('File does not exist %s', label_lookup_path)


        # Loads mapping from string UID to human-readable string
        # 加载编号字符串n********，即UID与分类名称之间的映射关系（字典）：uid_to_human
        
        # 读取uid_lookup_path中所有的lines
        # readlines(): Returns all lines from the file in a list.
        # Leaves the '\n' at the end.
        proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
        
        # 创建空字典uid_to_human用以存储映射关系
        uid_to_human = {}
# =============================================================================
#         # 使用正则化方法处理文件：
#         p = re.compile(r'[n\d]*[ \S,]*')
#         for line in proto_as_ascii_lines:         
#              = p.findall(line)
#             uid = parsed_items[0]
#             human_string = parsed_items[2]
#             uid_to_human[uid] = human_string
# =============================================================================
        # 使用简单方法处理文件：
        # 一行行读取数据
        for line in proto_as_ascii_lines:
            # 去掉换行符
            line = line.strip('\n')
            # 按照‘\t’分割，即tab，将line分为两个部分
            parse_items = line.split('\t')
            # 获取分类编码，即UID
            uid = parse_items[0]
            # 获取分类名称
            human_string = parse_items[1]
            # 新建编号字符串n********，即UID与分类名称之间的映射关系（字典）：uid_to_human
            uid_to_human[uid] = human_string
            

        # Loads mapping from string UID to integer node ID.
        # 加载编号字符串n********，即UID与分类代号，即node ID之间的映射关系（字典）
        
        # 加载分类字符串n********，即UID对应分类编号1-1000的文件
        proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
        # 创建空字典node_id_to_uid用以存储分类代码node ID与UID之间的关系
        node_id_to_uid = {}
        for line in proto_as_ascii:
            # 注意空格
            if line.startswith('  target_class:'):
                # 获取分类编号
                target_class = int(line.split(': ')[1])
            if line.startswith('  target_class_string:'):
                # 获取UID（带双引号，eg："n01484850"）
                target_class_string = line.split(': ')[1]
                # 去掉前后的双引号，构建映射关系
                node_id_to_uid[target_class] = target_class_string[1:-2]
    
        # Loads the final mapping of integer node ID to human-readable string
        # 加载node ID与分类名称之间的映射关系
        node_id_to_name = {}
        for key, val in node_id_to_uid.items():
            # 假如uid不存在于uid_to_human中，则报错
            if val not in uid_to_human:
                tf.logging.fatal('Failed to locate: %s', val)
            # 获取分类名称
            name = uid_to_human[val]
            # 构建分类编号1-1000对应分类名称的映射关系：key为node_id；val为name
            node_id_to_name[key] = name
    
        return node_id_to_name

    # 传入分类编号1-1000，返回分类具体名称
    def id_to_string(self, node_id):
        # 若不存在，则返回空字符串
        if node_id not in self.node_lookup:
            return ''
        return self.node_lookup[node_id]

# 读取并创建一个图graph来存放Google训练好的Inception_v3模型（函数）
def create_graph():
    with tf.gfile.FastGFile(os.path.join(
            model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        tf.import_graph_def(graph_def, name='')

#创建graph
create_graph()

# 创建会话，因为是从已有的Inception_v3模型中恢复，所以无需初始化
with tf.Session() as sess:
    # Inception_v3模型的最后一层softmax的输出
    softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
    
    # 遍历目录
    for root, dirs, files in os.walk('images/'):
        for file in files:
            # 载入图片
            image_data = tf.gfile.FastGFile(os.path.join(root, file), 'rb').read()
            # 输入图像（jpg格式）数据，得到softmax概率值（一个shape=(1,1008)的向量）
            predictions = sess.run(softmax_tensor,{'DecodeJpeg/contents:0': image_data})
            # 将结果转为1维数据
            predictions = np.squeeze(predictions)
    
            # 打印图片路径及名称
            image_path = os.path.join(root, file)
            print(image_path)
            # 显示图片
            img = Image.open(image_path)
            plt.imshow(img)
            plt.axis('off')
            plt.show()
            
            # 新建类：ID --> English string label.
            node_lookup = NodeLookup()
            # 排序，取出前5个概率最大的值（top-5)
            # argsort()返回的是数组值从小到大排列所对应的索引值
            top_5 = predictions.argsort()[-5:][::-1]
            for node_id in top_5:
                # 获取分类名称
                human_string = node_lookup.id_to_string(node_id)
                # 获取该分类的置信度
                score = predictions[node_id]
                print('%s (score = %.5f)' % (human_string, score))
            print()

最后输出：

runfile('C:/Users/Dexter/Documents/ML_files/171003_Inception_v3/test2.py', wdir='C:/Users/Dexter/Documents/ML_files/171003_Inception_v3')
images/dog.jpg

dingo, warrigal, warragal, Canis dingo (score = 0.46103)
Chihuahua (score = 0.05741)
Eskimo dog, husky (score = 0.04384)
dhole, Cuon alpinus (score = 0.04106)
Pembroke, Pembroke Welsh corgi (score = 0.02823)

images/husky.jpg

Siberian husky (score = 0.51033)
Eskimo dog, husky (score = 0.41048)
malamute, malemute, Alaskan malamute (score = 0.00653)
kelpie (score = 0.00136)
dogsled, dog sled, dog sleigh (score = 0.00133)

5. 相关函数补充说明

tf.get_default_graph()
返回当前进程中的默认图（可以使用Graph.as_default()设置）

Returns the default graph for the current thread.
The returned graph will be the innermost graph on which a Graph.as_default() context has been entered, or a global default graph if none has been explicitly created.
NOTE: The default graph is a property of the current thread. If you create a new thread, and wish to use the default graph in that thread, you must explicitly add a with g.as_default(): in that thread's function.

Returns:
The default Graph being used in the current thread.

tf.Graph.as_default()
将Graph设置为默认图

Returns a context manager that makes this Graph the default graph.

tf.Graph.get_tensor_by_name()

All tensors have string names which you can see as follows:
[tensor.name for tensor in tf.get_default_graph().as_graph_def().node]
Once you know the name you can fetch the Tensor using <name>:0 (0 refers to endpoint which is somewhat redundant)

import tensorflow as tf

c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
e = tf.matmul(c, d, name='example')

with tf.Session() as sess:
    test = sess.run(e)
    print (e.name)  
    #example:0
    #<name>:0 (0 refers to endpoint which is somewhat redundant)
    test = tf.get_default_graph().get_tensor_by_name("example:0")
    print (test)    
    #Tensor("example:0", shape=(2, 2), dtype=float32)

参考资料：

Tensorflow: How to get a tensor by name?

6. 一些改进

6.1 使用png或者其他图片格式，代替jpg作为输入

The shipped InceptionV3 graph used in classify_image.py
only supports JPEG images out-of-the-box. There are two ways you could use this graph with PNG images:

Convert the PNG image to a height
x width x 3 (channels) Numpy array, for example using PIL, then feed the 'DecodeJpeg:0' tensor:

import numpy as np
from PIL import Image
# ...

image = Image.open("example.png")
image_array = np.array(image)[:, :, 0:3]  # Select RGB channels only.

prediction = sess.run(softmax_tensor, {'DecodeJpeg:0': image_array})

Perhaps confusingly, 'DecodeJpeg:0' is the output of the DecodeJpeg op, so by feeding this tensor, you are able to feed raw image data.

Add a tf.image.decode_png() op to the imported graph. Simply switching the name of the fed tensor from 'DecodeJpeg/contents:0'
to 'DecodePng/contents:0' does not work because there is no 'DecodePng' op in the shipped graph. You can add such a node to the graph by using the input_map argument to tf.import_graph_def()
:

png_data = tf.placeholder(tf.string, shape=[])
decoded_png = tf.image.decode_png(png_data, channels=3)
# ...

graph_def = ...
softmax_tensor = tf.import_graph_def(
    graph_def,
    input_map={'DecodeJpeg:0': decoded_png},
    return_elements=['softmax:0'])

sess.run(softmax_tensor, {png_data: ...})

The following code should handle of both cases.

import numpy as np
from PIL import Image

image_file = 'test.jpeg'
with tf.Session() as sess:

    #     softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
    if image_file.lower().endswith('.jpeg'):
        image_data = tf.gfile.FastGFile(image_file, 'rb').read()
        prediction = sess.run('final_result:0', {'DecodeJpeg/contents:0': image_data})
    elif image_file.lower().endswith('.png'):
        image = Image.open(image_file)
        image_array = np.array(image)[:, :, 0:3]
        prediction = sess.run('final_result:0', {'DecodeJpeg:0': image_array})

    prediction = prediction[0]    
    print(prediction)

or shorter version with direct strings:

image_file = 'test.png' # or 'test.jpeg'
image_data = tf.gfile.FastGFile(image_file, 'rb').read()
ph = tf.placeholder(tf.string, shape=[])

with tf.Session() as sess:        
    predictions = sess.run(output_layer_name, {ph: image_data} )

TensorFlow学习笔记：使用Inception v3进行图像分类