由于特属原因,需要调试这里的代码。。。
代码地址
原始代码
与作者的讨论
代码是在ubuntu上面调试的
目录:
制作数据 voc
使用convert_data_to_tfrecord.py,制作tfrecord数据。
tf.app.flags.DEFINE_string('VOC_dir', 'data/{}/'.format(cfgs.DATASET_NAME), 'Voc dir')
tf.app.flags.DEFINE_string('xml_dir', 'Annotations', 'xml dir')
tf.app.flags.DEFINE_string('image_dir', 'JPEGImages', 'image dir')
tf.app.flags.DEFINE_string('save_name', 'train', 'save name')
tf.app.flags.DEFINE_string('save_dir', cfgs.ROOT_PATH + '/data/tfrecords/', 'save name')
tf.app.flags.DEFINE_string('img_format', '.jpg', 'format of image')
这里的参数就是数据目录
pascal就是存放数据的地方
- 修改141行
'img_name': _bytes_feature(img_name.encode('utf-8')),
执行如下,其他地方我没改
python convert_data_to_tfrecord.py
得到train.tfrecod
把数据分成两份就可以得到test和train数据
开始trian
VOC 的数据只有20分类。
修改改config.py,如下。文件默认和voc的不一样。
NET_NAME = 'resnet_v1_101'
DATASET_NAME = 'pascal'
VERSION = 'v1_{}'.format(DATASET_NAME)
CLASS_NUM = 20 # 700 exclude background
- 无预训练模型训练
修改train.py,注释掉如下部分开始训练
#restorer, restore_ckpt = restore_model.get_restorer(test=False)
saver = tf.train.Saver(max_to_keep=3)
config = tf.ConfigProto()
# config.gpu_options.per_process_gpu_memory_fraction = 0.5
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
sess.run(init_op)
# if not restorer is None:
# restorer.restore(sess, restore_ckpt)
# print('restore model')
在trian.py修改保存路径
summary_path = os.path.join('../output/{}'.format(cfgs.DATASET_NAME),
FLAGS.summary_path, cfgs.VERSION)
........................
save_dir = os.path.join('../output/{}'.format(cfgs.DATASET_NAME),
FLAGS.trained_checkpoint, cfgs.VERSION)
这时候就会在根目录得到output的目录。
训练
python train.py
如下:
得到训练参数:
loss图:
- 从预训练参数开始训练
修改config_res101.py的参数
tf.app.flags.DEFINE_string(
'pretrained_model_path',
'../output/pascal/res101_trained_weights/v1_pascal/pascal_500model.ckpt',
#YangHe_MyCode/FPN_TensorFlow-master/output/pascal/res101_trained_weights
# 'output-1/res101_trained_weights/v1_layer/voc_50000model.ckpt',
'the path of pretrained weights'
)
同时,把前面注释的代码还原:
restorer, restore_ckpt = restore_model.get_restorer(test=False)
saver = tf.train.Saver(max_to_keep=3)
config = tf.ConfigProto()
# config.gpu_options.per_process_gpu_memory_fraction = 0.5
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
sess.run(init_op)
if not restorer is None:
restorer.restore(sess, restore_ckpt)
print('restore model')
就可以训练了
python train.py
evla.py
对模型进行评估,这里的没法在命令行参数输入(本人太Low),So。。。
只能设置默认参数。。
如下:
注意:这里使用的数据就是刚才得到pascal_test.tfrecord
OK了。
这里有两个参数没用我就删掉了,src_folder,des_folder被我删了。
parser = argparse.ArgumentParser(description='Evaluate a trained FPN model')
parser.add_argument('--weights', dest='weights',
help='model path',
default='../output/pascal/res101_trained_weights/v1_pascal/pascal_500model.ckpt',
type=str)
parser.add_argument('--img_num', dest='img_num',
help='image numbers',
default=20, type=int)
这里需要设置预训练参数。。。。
修改pickle的读写吧,这里有BUG。
- eval.py的pickle写
- eval.py的pickle读
fr1 = open('predict_dict.pkl', 'rb')
fr2 = open('gtboxes_dict.pkl', 'rb')
predict_dict = pickle.load(fr1,encoding='iso-8859-1')
gtboxes_dict = pickle.load(fr2,encoding='iso-8859-1')
再修改:
这里加一个if len(rboxes) != 0的判断,防止rec[-1]报错.
执行这里eval.py
python eval.py
这里会得到两个文件gtboxes_dict.pkl和predict_dict.pkl。
这里得到map
来个mAP大写:
你没看错,这里mAP =0。。。。。。
出错
2018-11-27 09:33:31: step2050 image_name:b'2008_008546.jpg'
rpn_loc_loss:0.2765 | rpn_cla_loss:0.2248 | rpn_total_loss:0.5013
fast_rcnn_loc_loss:0.0773 | fast_rcnn_cla_loss:0.2025 | fast_rcnn_total_loss:0.2797
added_loss:0.7811 | total_loss:6.3684 | pre_cost_time:0.3456s
2018-11-27 09:33:48: step2100 image_name:b'2008_008745.jpg'
rpn_loc_loss:0.2128 | rpn_cla_loss:0.1975 | rpn_total_loss:0.4103
fast_rcnn_loc_loss:0.1009 | fast_rcnn_cla_loss:0.0907 | fast_rcnn_total_loss:0.1916
added_loss:0.6019 | total_loss:6.1892 | pre_cost_time:0.3453s
2018-11-27 09:34:10: step2150 image_name:b'2009_000161.jpg'
rpn_loc_loss:nan | rpn_cla_loss:0.6903 | rpn_total_loss:nan
fast_rcnn_loc_loss:nan | fast_rcnn_cla_loss:3.0358 | fast_rcnn_total_loss:nan
added_loss:nan | total_loss:10000000000.0000 | pre_cost_time:0.3478s
2018-11-27 09:34:29: step2200 image_name:b'2009_000393.jpg'
rpn_loc_loss:nan | rpn_cla_loss:0.6902 | rpn_total_loss:nan
fast_rcnn_loc_loss:nan | fast_rcnn_cla_loss:3.0358 | fast_rcnn_total_loss:nan
added_loss:nan | total_loss:10000000000.0000 | pre_cost_time:0.3507s
loss出现Nan,是因为这里的loc_loss有错误吧
修改loss
def l1_smooth_losses(predict_boxes, gtboxes, object_weights, classes_weights=None):
'''
:param predict_boxes: [minibatch_size, -1]
:param gtboxes: [minibatch_size, -1]
:param object_weights: [minibatch_size, ]. 1.0 represent object, 0.0 represent others(ignored or background)
:return:
'''
diff = predict_boxes - gtboxes
abs_diff = tf.cast(tf.abs(diff), tf.float32)
if classes_weights is None:
'''
first_stage:
predict_boxes :[minibatch_size, 4]
gtboxes: [minibatchs_size, 4]
'''
anchorwise_smooth_l1norm = tf.reduce_sum(
tf.where(tf.less(abs_diff, 1), 0.5 * tf.square(abs_diff), abs_diff - 0.5), axis=1) * object_weights
else:
'''
fast_rcnn:
predict_boxes: [minibatch_size, 4*num_classes]
gtboxes: [minibatch_size, 4*num_classes]
classes_weights : [minibatch_size, 4*num_classes]
'''
anchorwise_smooth_l1norm = tf.reduce_sum(tf.where(tf.less(abs_diff, 1), 0.5*tf.square(
abs_diff)*classes_weights, (abs_diff - 0.5)*classes_weights), axis=1)*object_weights
anchorwise_smooth_l1norm = tf.clip_by_value(anchorwise_smooth_l1norm, 1e-10, 1e10)
return tf.reduce_mean(anchorwise_smooth_l1norm, axis=0) # reduce mean
就是加这句:
tf.clip_by_value(anchorwise_smooth_l1norm, 1e-10, 1e10)
return tf.reduce_mean(anchorwise_smooth_l1norm, axis=0)
虽然signed integer is less than minimum
2018-11-27 11:00:27: step1150 image_name:b'2008_004903.jpg'
rpn_loc_loss:0.1078 | rpn_cla_loss:0.1171 | rpn_total_loss:0.2249
fast_rcnn_loc_loss:0.0625 | fast_rcnn_cla_loss:0.0605 | fast_rcnn_total_loss:0.1230
added_loss:0.3479 | total_loss:5.9376 | pre_cost_time:0.4136s
signed integer is less than minimum
signed integer is less than minimum
2018-11-27 11:00:49: step1200 image_name:b'2008_005101.jpg'
rpn_loc_loss:0.3705 | rpn_cla_loss:0.2455 | rpn_total_loss:0.6160
fast_rcnn_loc_loss:0.0000 | fast_rcnn_cla_loss:0.0069 | fast_rcnn_total_loss:0.0069
added_loss:0.6230 | total_loss:6.2127 | pre_cost_time:0.3253s
2018-11-27 11:01:11: step1250 image_name:b'2008_005321.jpg'
rpn_loc_loss:0.0295 | rpn_cla_loss:0.0396 | rpn_total_loss:0.0692
fast_rcnn_loc_loss:0.0170 | fast_rcnn_cla_loss:0.0240 | fast_rcnn_total_loss:0.0410
added_loss:0.1102 | total_loss:5.6999 | pre_cost_time:0.3905s
2018-11-27 11:01:33: step1300 image_name:b'2008_005514.jpg'
rpn_loc_loss:0.0564 | rpn_cla_loss:0.0632 | rpn_total_loss:0.1196
fast_rcnn_loc_loss:0.0000 | fast_rcnn_cla_loss:0.0057 | fast_rcnn_total_loss:0.0057
added_loss:0.1253 | total_los
改成这样
def l1_smooth_losses(predict_boxes, gtboxes, object_weights, classes_weights=None):
'''
:param predict_boxes: [minibatch_size, -1]
:param gtboxes: [minibatch_size, -1]
:param object_weights: [minibatch_size, ]. 1.0 represent object, 0.0 represent others(ignored or background)
:return:
'''
diff = predict_boxes - gtboxes
abs_diff = tf.cast(tf.abs(diff), tf.float32)
abs_diff = tf.clip_by_value(abs_diff, 1e-5, 1e2) # clip
if classes_weights is None:
'''
first_stage:
predict_boxes :[minibatch_size, 4]
gtboxes: [minibatchs_size, 4]
'''
anchorwise_smooth_l1norm = tf.reduce_sum(
tf.where(tf.less(abs_diff, 1), 0.5 * tf.square(abs_diff), abs_diff - 0.5), axis=1) * object_weights
else:
'''
fast_rcnn:
predict_boxes: [minibatch_size, 4*num_classes]
gtboxes: [minibatch_size, 4*num_classes]
classes_weights : [minibatch_size, 4*num_classes]
'''
anchorwise_smooth_l1norm = tf.reduce_sum(tf.where(tf.less(abs_diff, 1), 0.5*tf.square(
abs_diff)*classes_weights, (abs_diff - 0.5)*classes_weights), axis=1)*object_weights
return tf.reduce_mean(anchorwise_smooth_l1norm, axis=0) # reduce mean
这样至少不会报错
在检查这里的数据的时候发现这里的数据制作出错了
训练数据完全标错了
修改:
# -*- coding: utf-8 -*-
from __future__ import division, print_function, absolute_import
import sys
sys.path.append('../../')
import xml.etree.cElementTree as ET
import numpy as np
import tensorflow as tf
import glob
import cv2
from help_utils.tools import *
from libs.label_name_dict.label_dict import *
from lxml import etree
'''
python convert_data_to_tfrecord.py --VOC_dir=VOCdevkit_train/ --save_name=train --dataset=pascal --dataset=pascal
'''
tf.app.flags.DEFINE_string('VOC_dir', None, 'Voc dir')
tf.app.flags.DEFINE_string('xml_dir', 'Annotations', 'xml dir')
tf.app.flags.DEFINE_string('image_dir', 'JPEGImages', 'image dir')
tf.app.flags.DEFINE_string('save_name', 'train', 'save name')
tf.app.flags.DEFINE_string('save_dir', cfgs.ROOT_PATH + '/data/tfrecords/', 'save name')
tf.app.flags.DEFINE_string('img_format', '.jpg', 'format of image')
tf.app.flags.DEFINE_string('dataset', 'car', 'dataset')
FLAGS = tf.app.flags.FLAGS
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def _bytes_feature(value):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def recursive_parse_xml_to_dict(xml):
"""Recursively parses XML contents to python dict.
We assume that `object` tags are the only ones that can appear
multiple times at the same level of a tree.
Args:
xml: xml tree obtained by parsing XML file contents using lxml.etree
Returns:
Python dictionary holding XML contents.
"""
if not xml:
return {xml.tag: xml.text}
result = {}
for child in xml:
child_result = recursive_parse_xml_to_dict(child)
if child.tag != 'object':
result[child.tag] = child_result[child.tag]
else:
if child.tag not in result:
result[child.tag] = []
result[child.tag].append(child_result[child.tag])
return {xml.tag: result}
def read_xml_gtbox_and_label(xml_path):
"""
:param xml_path: the path of voc xml
:return: a list contains gtboxes and labels, shape is [num_of_gtboxes, 5],
and has [xmin, ymin, xmax, ymax, label] in a per row
"""
box_list = []
with open(xml_path,) as f:
xml_str = f.read()
#show_all_image_test()
xml = etree.fromstring(xml_str)
data = recursive_parse_xml_to_dict(xml)['annotation']
img_width = int(data['size']['width'])
img_height = int(data['size']['height'])
for obj in data['object']:
xmin = int(obj['bndbox']['xmin'])
ymin = int(obj['bndbox']['ymin'])
ymax = int(obj['bndbox']['ymax'])
xmax = int(obj['bndbox']['xmax'])
label = NAME_LABEL_MAP[obj['name']]
box_list.append([ymin, xmin, ymax, xmax, label])
gtbox_label = np.array(box_list, dtype=np.int32)
ymin, xmin, ymax, xmax, label = gtbox_label[:, 0], gtbox_label[:, 1], gtbox_label[:, 2], gtbox_label[:, 3], \
gtbox_label[:, 4]
xmin = np.where(xmin <= 0, 0, xmin)
ymin = np.where(ymin <= 0, 0, ymin)
xmax = np.where(xmax >= img_width, img_width , xmax)
ymax = np.where(ymax >= img_height, img_height, ymax)
gtbox_label = np.transpose(np.stack([ymin, xmin, ymax, xmax, label], axis=0)) # [ymin, xmin, ymax, xmax, label]
return img_height, img_width, gtbox_label
def convert_pascal_to_tfrecord():
xml_path = FLAGS.VOC_dir + FLAGS.xml_dir
image_path = FLAGS.VOC_dir + FLAGS.image_dir
save_path = FLAGS.save_dir + FLAGS.dataset + '_' + FLAGS.save_name + '.tfrecord'
mkdir(FLAGS.save_dir)
# writer_options = tf.python_io.TFRecordOptions(tf.python_io.TFRecordCompressionType.ZLIB)
# writer = tf.python_io.TFRecordWriter(path=save_path, options=writer_options)
writer = tf.python_io.TFRecordWriter(path=save_path)
for count, xml in enumerate(glob.glob(xml_path + '/*.xml')):
# to avoid path error in different development platform
xml = xml.replace('\\', '/')
img_name = xml.split('/')[-1].split('.')[0] + FLAGS.img_format
img_path = image_path + '/' + img_name
if not os.path.exists(img_path):
print('{} is not exist!'.format(img_path))
continue
img_height, img_width, gtbox_label = read_xml_gtbox_and_label(xml)
# img = np.array(Image.open(img_path))
img = cv2.imread(img_path)
feature = tf.train.Features(feature={
# maybe do not need encode() in linux
'img_name': _bytes_feature(img_name.encode('utf8')),
'img_height': _int64_feature(img_height),
'img_width': _int64_feature(img_width),
'img': _bytes_feature(img.tostring()),
'gtboxes_and_label': _bytes_feature(gtbox_label.tostring()),
'num_objects': _int64_feature(gtbox_label.shape[0])
})
example = tf.train.Example(features=feature)
writer.write(example.SerializeToString())
view_bar('Conversion progress', count + 1, len(glob.glob(xml_path + '/*.xml')))
print('\nConversion is complete!')
def show_all_image_test():
NAME_LABEL = list(NAME_LABEL_MAP.keys())
xml_path = 'VOCdevkit_train/' + FLAGS.xml_dir
image_path = 'VOCdevkit_train/'+ FLAGS.image_dir
for count, xml in enumerate(glob.glob(xml_path + '/*.xml')):
# to avoid path error in different development platform
xml = xml.replace('\\', '/')
img_name = xml.split('/')[-1].split('.')[0] + FLAGS.img_format
img_path = image_path + '/' + img_name
if not os.path.exists(img_path):
print('{} is not exist!'.format(img_path))
continue
img_height, img_width, gtbox_label = read_xml_gtbox_and_label(xml)
image = cv2.imread(img_path)
for i in range(len(gtbox_label)):
object = gtbox_label[i]
ymin, xmin, ymax, xmax, label = object
image = cv2.rectangle(image, (object[1], object[0]),
(object[3], object[2]),
color=(0, 255, 0))
cv2.putText(image,
text=str(len(gtbox_label)),
org=((image.shape[1]) // 2, (image.shape[0]) // 2),
fontFace=3,
fontScale=1,
color=(255, 0, 0))
if ymin <= 0 or xmin <= 0 or ymax >= img_height or xmax>=img_width:
cv2.putText(image,
text='error',
org=((image.shape[1]) // 2, (image.shape[0]) // 2),
fontFace=3,
fontScale=1,
color=(255, 0, 0))
else:
cv2.putText(image,
text=str(NAME_LABEL[object[4]]),
org=(object[1], object[0] + 10),
fontFace=1,
fontScale=1,
thickness=2,
color=(255, 0, 0))
cv2.imshow("s", image)
cv2.waitKey(500)
if __name__ == '__main__':
# xml_path = 'VOCdevkit_test/Annotations/2008_000082.xml'
# print(read_xml_gtbox_and_label(xml_path)) # show xml
show_all_image_test() # show label and image in plt
#convert_pascal_to_tfrecord() # craete the datasets
更新
作者的 mAP的代码也是比完整。来源是facebookresearch的评估方法。github。所以不用担心mAP不够官方了。
这里有一份关于mAP代码解释。
作者推荐的faster rcnn 调试
更新
今天,作者更新了FPN的代码,代码的风格和Faster R-CNN 一脉相承。
FPN_Tensorflow
这里的数据制作与原来的数据一点点区别:
gtbox_label = np.transpose(np.stack([xmin,ymin,xmax,ymax, label], axis=0))
# FPN old is [ymin, xmin, ymax, xmax, label]
# FPN new is [xmin,ymin,xmax,ymax, label]
# Faster rcnn [xmin,ymin,xmax,ymax, label]
return img_height, img_width, gtbox_label
新版的数据格式FPN 和 Faster rcnn一样的。
VOC提交结果
这是使用预训练数据进行训练之后的一个评估系统。
http://host.robots.ox.ac.uk/anonymous/RXNDLK.html
效果很好的