原文链接:https://finthon.com/resnet-captcha/
简介
在使用Python+Tensorflow的CNN技术快速识别验证码一文中,我们使用了3层简单的CNN卷积神经网络完成了验证码的识别,准确率不是特别高。虽然ResNet网络经常使用,但是一直没有时间来测试之前的验证码识别的项目。于是,我花了一下午的时间从头到尾梳理了这个项目:ResNet网络识别验证码实战练习——高准确率的机器学习模型。在本文中,我将会详细地介绍如何处理图片、制作tfrecord格式文件以及搭建训练、预测模型,一路实现高准确率的机器学习模型。
运行环境:
- Python 3.6.5
- TensorFlow 1.8.0(GPU版本)
图片处理
我直接使用先前的验证码做测试,毕竟花了6个小时码出来的验证码(3430张)。原始的验证码是这样的:
二值化处理后,得到如下的图形:
处理过程见前文。
图片resize
我们今天的项目就是基于处理后的验证码开始的。由于使用的是slim框架中的ResNet50网络,要求输入大小为 224×224×3,需要事先对图片大小进行统一修改,方便后面的操作。
import os
import numpy as np
from PIL import Image
# 验证码路径
captcha_path = r'F:\resnet_for_captcha\captcha4'
# 修改后图片存放路径
save_path = r'F:\resnet_for_captcha\1resize\resize_path'
for i in os.listdir(captcha_path):
img = Image.open(os.path.join(captcha_path, i))
arr = np.array(img) * 255 # 注意这里的np.array(img)是布尔值,之前二值化遗留下来的问题
im = Image.fromarray(arr)
im = im.resize((224, 224))
arr = np.array(im)
x = np.zeros([224, 224, 3]) # 创建一个224*224*3的矩阵
for ii in range(224):
for jj in range(224):
x[ii, jj, :] = arr[ii, jj]
im = Image.fromarray(x.astype('uint8')) # 图片矩阵使用该格式
im.save(os.path.join(save_path, i))
在这里需要注意几点:原来的图片是 114×450 大小的,要变成 224×224×3 大小,相当于多加了一个维度,在第三个维度上的值都是一样的;图片矩阵一定要改成 uint8 格式,表示像素点范围0-255之间。转换完成后就变成下图这样:
虽然部分拉伸了图片,但是对机器学习来说可以接受。
tfrecord格式
为了提高机器学习的效率,需要构建队列模式来减少机器读取数据的等待时间。TensorFlow提供了tfrecord格式文件,结合队列模式能够方便数据的读取操作。在resize验证码图片之后,再将其转换成tfrecord格式文件。
import os
import tensorflow as tf
from PIL import Image
import numpy as np
# 将验证码的名字转换成数组,one hot编码
def name2vec(name):
vector = np.zeros(6 * 26)
for i, c in enumerate(name):
idx = i * 26 + ord(c) - 97
vector[idx] = 1
return vector
# 图片路径
cwd = r'F:\resnet_for_captcha\1resize\resize_path'
# 文件路径
file_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
# 存放图片个数
bestnum = 1000
# 第几个图片
num = 0
# 第几个TFRecord文件
recordfilenum = 0
# tfrecords格式文件名
ftrecordfilename = ("train.tfrecords-%.3d" % recordfilenum)
writer = tf.python_io.TFRecordWriter(os.path.join(file_path, ftrecordfilename))
for i in os.listdir(cwd):
num += 1
print(num)
if num > bestnum:
num = 1
recordfilenum += 1
ftrecordfilename = ("train.tfrecords-%.3d" % recordfilenum)
writer = tf.python_io.TFRecordWriter(os.path.join(file_path, ftrecordfilename))
name = os.path.splitext(i)[0]
name_vec = name2vec(name).tobytes() # 转成二进制格式
img = Image.open(os.path.join(cwd, i), 'r')
img_raw = img.tobytes()
example = tf.train.Example(
features=tf.train.Features(feature={
'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[name_vec])),
'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw])),
}))
writer.write(example.SerializeToString())
writer.close()
我们以每1000张图片制作一个tfrecord文件,我们把图片的像素矩阵和图片的标签都做成了二进制的数据流,存入tfrecord文件中。最后得到的tfrecord文件如下所示:
训练模型
做好了tfrecord文件后,接下来的任务就是搭建训练模型,其中包括读取tfrecord文件、构建队列模式、构建ResNet网络等。
首先定义读取tfrecord文件的函数:
def read_and_decode_tfrecord(filename):
filename_deque = tf.train.string_input_producer(filename)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_deque)
features = tf.parse_single_example(serialized_example, features={
'label': tf.FixedLenFeature([], tf.string),
'img_raw': tf.FixedLenFeature([], tf.string)})
label = tf.decode_raw(features['label'], tf.float64)
label = tf.reshape(label, [6 * 26])
label = tf.cast(label, tf.float32)
img = tf.decode_raw(features['img_raw'], tf.uint8)
img = tf.reshape(img, [224, 224, 3])
img = tf.cast(img, tf.float32) / 255.0 # 归一化
return img, label
filename就是每个tfrecord的地址,因为我们是以二进制流存入tfrecord文件的,当从文件中读取原来的数据的时候,需要指定数据格式和大小。在这里我们对图片矩阵进行归一化。
接下来我们在主函数中定义队列和模型部分:
def main():
save_dir = r"F:\resnet_for_captcha\3train\model\train.model"
batch_size_ = 2
lr = tf.Variable(0.0001, dtype=tf.float32)
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
y_ = tf.placeholder(tf.float32, [None, 6 * 26])
tfrecord_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
train_list = []
for file in os.listdir(tfrecord_path):
train_list.append(os.path.join(tfrecord_path, file))
min_after_dequeue = 1000
# 随机打乱顺序
img, label = read_and_decode_tfrecord(train_list)
img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_,
capacity=min_after_dequeue + 3 * batch_size_,
min_after_dequeue=min_after_dequeue)
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
pred = tf.reshape(pred, shape=[-1, 6 * 26])
# 定义损失函数
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=y_))
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
# 准确度
predict = tf.reshape(pred, [-1, 6, 26])
max_idx_p = tf.argmax(predict, 2)
max_idx_l = tf.argmax(tf.reshape(y_, [-1, 6, 26]), 2)
correct_pred = tf.equal(max_idx_p, max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
saver = tf.train.Saver()
with tf.Session(config=config) as sess:
sess.run(tf.global_variables_initializer())
# saver.restore(sess, r'F:\resnet_for_captcha\3train\model\train.model-60000')
# 创建一个协调器,管理线程
coord = tf.train.Coordinator()
# 启动QueueRunner,此时文件名队列已经进队
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
i = 60000
cycle_num = 0
while True:
i += 1
b_image, b_label = sess.run([img_batch, label_batch])
_, loss_ = sess.run([optimizer, loss], feed_dict={x: b_image, y_: b_label})
if i % 20 == 0:
print('step: {}, loss: {}'.format(i, loss_))
if i % 100 == 0:
_loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label})
print('--------------------------------------------------------')
print('step: {} train_acc: {} loss: {}'.format(i, acc_train, _loss))
print('--------------------------------------------------------')
if i % 20000 == 0:
saver.save(sess, save_dir, global_step=i)
cycle_num += 1
if cycle_num == 10:
break
coord.request_stop()
# 其他所有线程关闭之后,这一函数才能返回
coord.join(threads)
构建队列使用的是随机顺序队列,使用 resnet_v2_50 模型。在这里使用了线程来管理队列,每20步输出一次loss值,每100步输出一次准确率,每20000步保存一次模型。这里选择 batch_size_ = 2 ,个人测试后觉得不错的经验值。最后,按照整个流程训练了14万步停止。
这部分完整代码如下:
import os
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
import datetime
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
def read_and_decode_tfrecord(filename):
filename_deque = tf.train.string_input_producer(filename)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_deque)
features = tf.parse_single_example(serialized_example, features={
'label': tf.FixedLenFeature([], tf.string),
'img_raw': tf.FixedLenFeature([], tf.string)})
label = tf.decode_raw(features['label'], tf.float64)
label = tf.reshape(label, [6 * 26])
label = tf.cast(label, tf.float32)
img = tf.decode_raw(features['img_raw'], tf.uint8)
img = tf.reshape(img, [224, 224, 3])
img = tf.cast(img, tf.float32) / 255.0 # 归一化
return img, label
def main():
save_dir = r"F:\resnet_for_captcha\3train\model\train.model"
batch_size_ = 2
lr = tf.Variable(0.0001, dtype=tf.float32)
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
y_ = tf.placeholder(tf.float32, [None, 6 * 26])
tfrecord_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
train_list = []
for file in os.listdir(tfrecord_path):
train_list.append(os.path.join(tfrecord_path, file))
min_after_dequeue = 1000
# 随机打乱顺序
img, label = read_and_decode_tfrecord(train_list)
img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_,
capacity=min_after_dequeue + 3 * batch_size_,
min_after_dequeue=min_after_dequeue)
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
pred = tf.reshape(pred, shape=[-1, 6 * 26])
# 定义损失函数
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=y_))
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
# 准确度
predict = tf.reshape(pred, [-1, 6, 26])
max_idx_p = tf.argmax(predict, 2)
max_idx_l = tf.argmax(tf.reshape(y_, [-1, 6, 26]), 2)
correct_pred = tf.equal(max_idx_p, max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
saver = tf.train.Saver()
with tf.Session(config=config) as sess:
sess.run(tf.global_variables_initializer())
# saver.restore(sess, r'F:\resnet_for_captcha\3train\model\train.model-60000')
# 创建一个协调器,管理线程
coord = tf.train.Coordinator()
# 启动QueueRunner,此时文件名队列已经进队
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
i = 60000
cycle_num = 0
while True:
i += 1
b_image, b_label = sess.run([img_batch, label_batch])
_, loss_ = sess.run([optimizer, loss], feed_dict={x: b_image, y_: b_label})
if i % 20 == 0:
print('step: {}, loss: {}'.format(i, loss_))
if i % 100 == 0:
_loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label})
print('--------------------------------------------------------')
print('step: {} train_acc: {} loss: {}'.format(i, acc_train, _loss))
print('--------------------------------------------------------')
if i % 20000 == 0:
saver.save(sess, save_dir, global_step=i)
cycle_num += 1
if cycle_num == 10:
break
coord.request_stop()
# 其他所有线程关闭之后,这一函数才能返回
coord.join(threads)
if __name__ == '__main__':
# 运行时间
starttime = datetime.datetime.now().timestamp()
main()
endtime = datetime.datetime.now().timestamp()
print(starttime)
print(endtime)
run_hour = (endtime - starttime) / 3600
print('共运行{}小时!'.format(run_hour))
注:加上 config = tf.ConfigProto() 和 config.gpu_options.allow_growth = True 后能够提高GPU的利用率,最后我们计算程序运行的时间。
预测模型
我们分别搭建两个预测模型来展开这部分:一个模型来预测咱们训练的3430张图片,看模型学习后的准确率如何;另一个模型来预测样本外的10张验证码图片,来检测我们模型的泛化能力。
预测训练集
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
import os
import numpy as np
def read_and_decode_tfrecord(filename):
filename_deque = tf.train.string_input_producer(filename)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_deque)
features = tf.parse_single_example(serialized_example, features={
'label': tf.FixedLenFeature([], tf.string),
'img_raw': tf.FixedLenFeature([], tf.string)})
label = tf.decode_raw(features['label'], tf.float64)
label = tf.reshape(label, [6 * 26])
label = tf.cast(label, tf.float32)
img = tf.decode_raw(features['img_raw'], tf.uint8)
img = tf.reshape(img, [224, 224, 3])
img = tf.cast(img, tf.float32) / 255.0 # 归一化
return img, label
def vec2name(vec):
name = []
for i in vec:
a = chr(i + 97)
name.append(a)
return "".join(name)
model_dir = r'F:\resnet_for_captcha\3train\model\train.model-140000'
tfrecord_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
train_list = []
for file in os.listdir(tfrecord_path):
train_list.append(os.path.join(tfrecord_path, file))
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
y_ = tf.placeholder(tf.float32, [None, 6 * 26])
batch_size_ = 1
min_after_dequeue = 1000
# 顺序读取
img, label = read_and_decode_tfrecord(train_list)
img_batch, label_batch = tf.train.batch([img, label], num_threads=2, batch_size=batch_size_,
capacity=min_after_dequeue + 3 * batch_size_)
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
predict = tf.reshape(pred, [-1, 6, 26])
max_idx_p = tf.argmax(predict, 2)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, model_dir)
coord = tf.train.Coordinator()
# 启动QueueRunner,此时文件名队列已经进队
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
nn = 0
count_true_num = 0
count_false_num = 0
while True:
nn += 1
b_image, b_label = sess.run([img_batch, label_batch])
pre_index = sess.run(max_idx_p, feed_dict={x: b_image})
vec = pre_index[0].tolist()
predict_text = vec2name(vec)
max_idx_l = np.argmax(np.reshape(b_label, [-1, 6, 26]), 2)
vec1 = max_idx_l[0].tolist()
true_text = vec2name(vec1)
print('{} 真实值:{} 预测值:{}'.format(nn, true_text, predict_text))
if true_text == predict_text:
count_true_num += 1
else:
count_false_num += 1
if nn == 3430:
break
print('正确:{} 错误:{} 准确率:{}'.format(count_true_num, count_false_num,
count_true_num / (count_true_num + count_false_num)))
coord.request_stop()
# 其他所有线程关闭之后,这一函数才能返回
coord.join(threads)
因为是预测训练集,所以这里的队列不再是随机打乱顺序,而是使用了顺序读取 tf.train.batch ,最终按照14万步模型,训练模型的准确率为0.9915,如果进一步训练,准确率还会更高。
预测测试集
对于测试集,我提供了10张样本外的验证码图片,由于测试集数量比较少,就不需要构建tfrecord文件。
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
from PIL import Image
import os
import numpy as np
# 向量转成标签名字
def vec2name(vec):
name = []
for i in vec:
a = chr(i + 97)
name.append(a)
return "".join(name)
model_dir = r'F:\resnet_for_captcha\3train\model\train.model-140000'
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
predict = tf.reshape(pred, [-1, 6, 26])
max_idx_p = tf.argmax(predict, 2)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver.restore(sess, model_dir)
test_dir = r'F:\resnet_for_captcha\test'
for pic in os.listdir(test_dir):
pic_path = os.path.join(test_dir, pic)
img = Image.open(pic_path)
arr = np.array(img) * 255
im = Image.fromarray(arr)
im = im.resize((224, 224))
arr = np.array(im)
xx = np.zeros([224, 224, 3])
for ii in range(224):
for jj in range(224):
xx[ii, jj, :] = arr[ii, jj]
img1 = Image.fromarray(xx.astype('uint8'))
img2 = tf.reshape(img1, [1, 224, 224, 3])
img3 = tf.cast(img2, tf.float32) / 255.0
name = os.path.splitext(pic)[0]
b_image = sess.run(img3)
t_label = sess.run(max_idx_p, feed_dict={x: b_image})
vec = t_label[0].tolist()
predict_text = vec2name(vec)
print('真实值:{} 预测值:{}'.format(name, predict_text))
得到的结果如下:
咱们实现了100%的准确率!!!比之前的预测结果高了很多。
总结
本文介绍了如何处理验证码图片、制作tfrecord格式文件、构建训练和预测模型。使用ResNet网络识别验证码图片,实现了非常高的准确率。可见ResNet网络的普适性和效果都是非常好的。本文代码放在我的github,所有代码和模型全部放到这里,密码:f3k7。涉及到细节的部分本文并没有作过多的介绍,默认你有一定的TensorFlow基础,如有问题请在下面留言。