tensorflow训练maskrcnn并在opencv下部署预测

安装labelme3

sudo pip3 install labelme

标注数据后的数据增加，这里提供一个数据增强工具，GitHub里面的json-rotate

注意json旋转后图像mask不能从图像里面出去，否则训练会报错

generate tf.record

    python3 create_tf_record.py \

    --images_dir=path_to_images_dir \ 

    --annotations_json_dir=path_to_train_annotations_json_dir \

    --label_map_path=path_to_label_map.pbtxt \

    --output_path=path_to_train.record

create_tf_reord.py代码：GitHubmask名字必须和图片名一一对应

下载inception预训练模型

download tensorflow models

git clone https://github.com/tensorflow/models.git

train

在models/research/object_detection目录下执行

python model_main.py \

    --model_dir=path/to/save/directory \

    --pipeline_config_path=path/to/mask_rcnn_inception_v2_xxx.config

训练完成之后将ckpt转化成frozen.pb文件：

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

#    http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

Tool to export an object detection model for inference.

Prepares an object detection tensorflow graph for inference using model

configuration and a trained checkpoint. Outputs inference

graph, associated checkpoint files, a frozen inference graph and a

SavedModel (https://tensorflow.github.io/serving/serving_basic.html).

The inference graph contains one of three input nodes depending on the user

specified option.

  * `image_tensor`: Accepts a uint8 4-D tensor of shape [None, None, None, 3]

  * `encoded_image_string_tensor`: Accepts a 1-D string tensor of shape [None]

    containing encoded PNG or JPEG images. Image resolutions are expected to be

    the same if more than 1 image is provided.

  * `tf_example`: Accepts a 1-D string tensor of shape [None] containing

    serialized TFExample protos. Image resolutions are expected to be the same

    if more than 1 image is provided.

and the following output nodes returned by the model.postprocess(..):

  * `num_detections`: Outputs float32 tensors of the form [batch]

      that specifies the number of valid boxes per image in the batch.

  * `detection_boxes`: Outputs float32 tensors of the form

      [batch, num_boxes, 4] containing detected boxes.

  * `detection_scores`: Outputs float32 tensors of the form

      [batch, num_boxes] containing class scores for the detections.

  * `detection_classes`: Outputs float32 tensors of the form

      [batch, num_boxes] containing classes for the detections.

  * `detection_masks`: Outputs float32 tensors of the form

      [batch, num_boxes, mask_height, mask_width] containing predicted instance

      masks for each box if its present in the dictionary of postprocessed

      tensors returned by the model.

Notes:

* This tool uses `use_moving_averages` from eval_config to decide which

  weights to freeze.

Example Usage:

--------------

python export_inference_graph \

    --input_type image_tensor \

    --pipeline_config_path path/to/ssd_inception_v2.config \

    --trained_checkpoint_prefix path/to/model.ckpt \

    --output_directory path/to/exported_model_directory

The expected output would be in the directory

path/to/exported_model_directory (which is created if it does not exist)

with contents:

- inference_graph.pbtxt

- model.ckpt.data-00000-of-00001

- model.ckpt.info

- model.ckpt.meta

- frozen_inference_graph.pb

+ saved_model (a directory)

Config overrides (see the `config_override` flag) are text protobufs

(also of type pipeline_pb2.TrainEvalPipelineConfig) which are used to override

certain fields in the provided pipeline_config_path.  These are useful for

making small changes to the inference graph that differ from the training or

eval config.

Example Usage (in which we change the second stage post-processing score

threshold to be 0.5):

python export_inference_graph \

    --input_type image_tensor \

    --pipeline_config_path path/to/ssd_inception_v2.config \

    --trained_checkpoint_prefix path/to/model.ckpt \

    --output_directory path/to/exported_model_directory \

    --config_override " \

            model{ \

              faster_rcnn { \

                second_stage_post_processing { \

                  batch_non_max_suppression { \

                    score_threshold: 0.5 \

                  } \

                } \

              } \

            }"

"""

import tensorflow as tf

from google.protobuf import text_format

from object_detection import exporter

from object_detection.protos import pipeline_pb2

slim = tf.contrib.slim

flags = tf.app.flags

flags.DEFINE_string('input_type', 'image_tensor', 'Type of input node. Can be '

                    'one of [`image_tensor`, `encoded_image_string_tensor`, '

                    '`tf_example`]')

flags.DEFINE_string('input_shape', None,

                    'If input_type is `image_tensor`, this can explicitly set '

                    'the shape of this input tensor to a fixed size. The '

                    'dimensions are to be provided as a comma-separated list '

                    'of integers. A value of -1 can be used for unknown '

                    'dimensions. If not specified, for an `image_tensor, the '

                    'default shape will be partially specified as '

                    '`[None, None, None, 3]`.')

flags.DEFINE_string('pipeline_config_path', None,

                    'Path to a pipeline_pb2.TrainEvalPipelineConfig config '

                    'file.')

flags.DEFINE_string('trained_checkpoint_prefix', None,

                    'Path to trained checkpoint, typically of the form '

                    'path/to/model.ckpt')

flags.DEFINE_string('output_directory', None, 'Path to write outputs.')

flags.DEFINE_string('config_override', '',

                    'pipeline_pb2.TrainEvalPipelineConfig '

                    'text proto to override pipeline_config_path.')

flags.DEFINE_boolean('write_inference_graph', False,

                    'If true, writes inference graph to disk.')

tf.app.flags.mark_flag_as_required('pipeline_config_path')

tf.app.flags.mark_flag_as_required('trained_checkpoint_prefix')

tf.app.flags.mark_flag_as_required('output_directory')

FLAGS = flags.FLAGS

def main(_):

  pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()

  with tf.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f:

    text_format.Merge(f.read(), pipeline_config)

  text_format.Merge(FLAGS.config_override, pipeline_config)

  if FLAGS.input_shape:

    input_shape = [

        int(dim) if dim != '-1' else None

        for dim in FLAGS.input_shape.split(',')

    ]

  else:

    input_shape = None

  exporter.export_inference_graph(

      FLAGS.input_type, pipeline_config, FLAGS.trained_checkpoint_prefix,

      FLAGS.output_directory, input_shape=input_shape,

      write_inference_graph=FLAGS.write_inference_graph)

if __name__ == '__main__':

  tf.app.run()

运行：

python export_inference_graph.py --input_type image_tensor

    --pipeline_config_path mask_rcnn_test/training/mask_rcnn_inception_v2_coco.config   

--trained_checkpoint_prefix mask_rcnn_test/training/model1/model.ckpt-6414   

--output_directory mask_rcnn_test/training/inference2

将生成的frozen转换到opencv需要的graph文件

pb2graph

python tf_text_graph_rcnn.py

import argparse

import numpy as np

from tf_text_graph_common import *

parser = argparse.ArgumentParser(description='Run this script to get a text graph of '

                                            'Mask-RCNN model from TensorFlow Object Detection API. '

                                            'Then pass it with .pb file to cv::dnn::readNetFromTensorflow function.')

parser.add_argument('--input', required=True, help='Path to frozen TensorFlow graph.')

parser.add_argument('--output', required=True, help='Path to output text graph.')

parser.add_argument('--config', required=True, help='Path to a *.config file is used for training.')

args = parser.parse_args()

scopesToKeep = ('FirstStageFeatureExtractor', 'Conv',

                'FirstStageBoxPredictor/BoxEncodingPredictor',

                'FirstStageBoxPredictor/ClassPredictor',

                'CropAndResize',

                'MaxPool2D',

                'SecondStageFeatureExtractor',

                'SecondStageBoxPredictor',

                'Preprocessor/sub',

                'Preprocessor/mul',

                'image_tensor')

scopesToIgnore = ('FirstStageFeatureExtractor/Assert',

                  'FirstStageFeatureExtractor/Shape',

                  'FirstStageFeatureExtractor/strided_slice',

                  'FirstStageFeatureExtractor/GreaterEqual',

                  'FirstStageFeatureExtractor/LogicalAnd')

# Load a config file.

config = readTextMessage(args.config)

config = config['model'][0]['faster_rcnn'][0]

num_classes = int(config['num_classes'][0])

grid_anchor_generator = config['first_stage_anchor_generator'][0]['grid_anchor_generator'][0]

scales = [float(s) for s in grid_anchor_generator['scales']]

aspect_ratios = [float(ar) for ar in grid_anchor_generator['aspect_ratios']]

width_stride = float(grid_anchor_generator['width_stride'][0])

height_stride = float(grid_anchor_generator['height_stride'][0])

features_stride = float(config['feature_extractor'][0]['first_stage_features_stride'][0])

first_stage_nms_iou_threshold = float(config['first_stage_nms_iou_threshold'][0])

first_stage_max_proposals = int(config['first_stage_max_proposals'][0])

print('Number of classes: %d' % num_classes)

print('Scales:            %s' % str(scales))

print('Aspect ratios:    %s' % str(aspect_ratios))

print('Width stride:      %f' % width_stride)

print('Height stride:    %f' % height_stride)

print('Features stride:  %f' % features_stride)

# Read the graph.

writeTextGraph(args.input, args.output, ['num_detections', 'detection_scores', 'detection_boxes', 'detection_classes', 'detection_masks'])

graph_def = parseTextGraph(args.output)

removeIdentity(graph_def)

def to_remove(name, op):

    return name.startswith(scopesToIgnore) or not name.startswith(scopesToKeep) or \

          (name.startswith('CropAndResize') and op != 'CropAndResize')

removeUnusedNodesAndAttrs(to_remove, graph_def)

# Connect input node to the first layer

assert(graph_def.node[0].op == 'Placeholder')

graph_def.node[1].input.insert(0, graph_def.node[0].name)

# Temporarily remove top nodes.

topNodes = []

numCropAndResize = 0

while True:

    node = graph_def.node.pop()

    topNodes.append(node)

    if node.op == 'CropAndResize':

        numCropAndResize += 1

        if numCropAndResize == 2:

            break

addReshape('FirstStageBoxPredictor/ClassPredictor/BiasAdd',

          'FirstStageBoxPredictor/ClassPredictor/reshape_1', [0, -1, 2], graph_def)

addSoftMax('FirstStageBoxPredictor/ClassPredictor/reshape_1',

          'FirstStageBoxPredictor/ClassPredictor/softmax', graph_def)  # Compare with Reshape_4

addFlatten('FirstStageBoxPredictor/ClassPredictor/softmax',

          'FirstStageBoxPredictor/ClassPredictor/softmax/flatten', graph_def)

# Compare with FirstStageBoxPredictor/BoxEncodingPredictor/BiasAdd

addFlatten('FirstStageBoxPredictor/BoxEncodingPredictor/BiasAdd',

          'FirstStageBoxPredictor/BoxEncodingPredictor/flatten', graph_def)

proposals = NodeDef()

proposals.name = 'proposals'  # Compare with ClipToWindow/Gather/Gather (NOTE: normalized)

proposals.op = 'PriorBox'

proposals.input.append('FirstStageBoxPredictor/BoxEncodingPredictor/BiasAdd')

proposals.input.append(graph_def.node[0].name)  # image_tensor

proposals.addAttr('flip', False)

proposals.addAttr('clip', True)

proposals.addAttr('step', features_stride)

proposals.addAttr('offset', 0.0)

proposals.addAttr('variance', [0.1, 0.1, 0.2, 0.2])

widths = []

heights = []

for a in aspect_ratios:

    for s in scales:

        ar = np.sqrt(a)

        heights.append((features_stride**2) * s / ar)

        widths.append((features_stride**2) * s * ar)

proposals.addAttr('width', widths)

proposals.addAttr('height', heights)

graph_def.node.extend([proposals])

# Compare with Reshape_5

detectionOut = NodeDef()

detectionOut.name = 'detection_out'

detectionOut.op = 'DetectionOutput'

detectionOut.input.append('FirstStageBoxPredictor/BoxEncodingPredictor/flatten')

detectionOut.input.append('FirstStageBoxPredictor/ClassPredictor/softmax/flatten')

detectionOut.input.append('proposals')

detectionOut.addAttr('num_classes', 2)

detectionOut.addAttr('share_location', True)

detectionOut.addAttr('background_label_id', 0)

detectionOut.addAttr('nms_threshold', first_stage_nms_iou_threshold)

detectionOut.addAttr('top_k', 6000)

detectionOut.addAttr('code_type', "CENTER_SIZE")

detectionOut.addAttr('keep_top_k', first_stage_max_proposals)

detectionOut.addAttr('clip', True)

graph_def.node.extend([detectionOut])

# Save as text.

cropAndResizeNodesNames = []

for node in reversed(topNodes):

    if node.op != 'CropAndResize':

        graph_def.node.extend([node])

        topNodes.pop()

    else:

        cropAndResizeNodesNames.append(node.name)

        if numCropAndResize == 1:

            break

        else:

            graph_def.node.extend([node])

            topNodes.pop()

            numCropAndResize -= 1

addSoftMax('SecondStageBoxPredictor/Reshape_1', 'SecondStageBoxPredictor/Reshape_1/softmax', graph_def)

addSlice('SecondStageBoxPredictor/Reshape_1/softmax',

        'SecondStageBoxPredictor/Reshape_1/slice',

        [0, 0, 1], [-1, -1, -1], graph_def)

addReshape('SecondStageBoxPredictor/Reshape_1/slice',

          'SecondStageBoxPredictor/Reshape_1/Reshape', [1, -1], graph_def)

# Replace Flatten subgraph onto a single node.

for i in reversed(range(len(graph_def.node))):

    if graph_def.node[i].op == 'CropAndResize':

        graph_def.node[i].input.insert(1, 'detection_out')

    if graph_def.node[i].name == 'SecondStageBoxPredictor/Reshape':

        addConstNode('SecondStageBoxPredictor/Reshape/shape2', [1, -1, 4], graph_def)

        graph_def.node[i].input.pop()

        graph_def.node[i].input.append('SecondStageBoxPredictor/Reshape/shape2')

    if graph_def.node[i].name in ['SecondStageBoxPredictor/Flatten/flatten/Shape',

                                  'SecondStageBoxPredictor/Flatten/flatten/strided_slice',

                                  'SecondStageBoxPredictor/Flatten/flatten/Reshape/shape',

                                  'SecondStageBoxPredictor/Flatten_1/flatten/Shape',

                                  'SecondStageBoxPredictor/Flatten_1/flatten/strided_slice',

                                  'SecondStageBoxPredictor/Flatten_1/flatten/Reshape/shape']:

        del graph_def.node[i]

for node in graph_def.node:

    if node.name == 'SecondStageBoxPredictor/Flatten/flatten/Reshape' or \

      node.name == 'SecondStageBoxPredictor/Flatten_1/flatten/Reshape':

        node.op = 'Flatten'

        node.input.pop()

    if node.name in ['FirstStageBoxPredictor/BoxEncodingPredictor/Conv2D',

                    'SecondStageBoxPredictor/BoxEncodingPredictor/MatMul']:

        node.addAttr('loc_pred_transposed', True)

    if node.name.startswith('MaxPool2D'):

        assert(node.op == 'MaxPool')

        assert(len(cropAndResizeNodesNames) == 2)

        node.input = [cropAndResizeNodesNames[0]]

        del cropAndResizeNodesNames[0]

################################################################################

### Postprocessing

################################################################################

addSlice('detection_out', 'detection_out/slice', [0, 0, 0, 3], [-1, -1, -1, 4], graph_def)

variance = NodeDef()

variance.name = 'proposals/variance'

variance.op = 'Const'

variance.addAttr('value', [0.1, 0.1, 0.2, 0.2])

graph_def.node.extend([variance])

varianceEncoder = NodeDef()

varianceEncoder.name = 'variance_encoded'

varianceEncoder.op = 'Mul'

varianceEncoder.input.append('SecondStageBoxPredictor/Reshape')

varianceEncoder.input.append(variance.name)

varianceEncoder.addAttr('axis', 2)

graph_def.node.extend([varianceEncoder])

addReshape('detection_out/slice', 'detection_out/slice/reshape', [1, 1, -1], graph_def)

addFlatten('variance_encoded', 'variance_encoded/flatten', graph_def)

detectionOut = NodeDef()

detectionOut.name = 'detection_out_final'

detectionOut.op = 'DetectionOutput'

detectionOut.input.append('variance_encoded/flatten')

detectionOut.input.append('SecondStageBoxPredictor/Reshape_1/Reshape')

detectionOut.input.append('detection_out/slice/reshape')

detectionOut.addAttr('num_classes', num_classes)

detectionOut.addAttr('share_location', False)

detectionOut.addAttr('background_label_id', num_classes + 1)

detectionOut.addAttr('nms_threshold', 0.6)

detectionOut.addAttr('code_type', "CENTER_SIZE")

detectionOut.addAttr('keep_top_k',100)

detectionOut.addAttr('clip', True)

detectionOut.addAttr('variance_encoded_in_target', True)

detectionOut.addAttr('confidence_threshold', 0.3)

detectionOut.addAttr('group_by_classes', False)

graph_def.node.extend([detectionOut])

for node in reversed(topNodes):

    graph_def.node.extend([node])

    if node.name.startswith('MaxPool2D'):

        assert(node.op == 'MaxPool')

        assert(len(cropAndResizeNodesNames) == 1)

        node.input = [cropAndResizeNodesNames[0]]

for i in reversed(range(len(graph_def.node))):

    if graph_def.node[i].op == 'CropAndResize':

        graph_def.node[i].input.insert(1, 'detection_out_final')

        break

graph_def.node[-1].name = 'detection_masks'

graph_def.node[-1].op = 'Sigmoid'

graph_def.node[-1].input.pop()

# Save as text.

graph_def.save(args.output)

opencv结果如下：具体inference代码见Github

image

tensorflow训练maskrcnn并在opencv下部署预测

安装labelme3

generate tf.record

将生成的frozen转换到opencv需要的graph文件

友情链接更多精彩内容