ResNet网络识别验证码实战练习——高准确率的机器学习模型

原文链接：https://finthon.com/resnet-captcha/

简介

在使用Python+Tensorflow的CNN技术快速识别验证码一文中，我们使用了3层简单的CNN卷积神经网络完成了验证码的识别，准确率不是特别高。虽然ResNet网络经常使用，但是一直没有时间来测试之前的验证码识别的项目。于是，我花了一下午的时间从头到尾梳理了这个项目：ResNet网络识别验证码实战练习——高准确率的机器学习模型。在本文中，我将会详细地介绍如何处理图片、制作tfrecord格式文件以及搭建训练、预测模型，一路实现高准确率的机器学习模型。

运行环境：

Python 3.6.5
TensorFlow 1.8.0（GPU版本）

图片处理

我直接使用先前的验证码做测试，毕竟花了6个小时码出来的验证码（3430张）。原始的验证码是这样的：

《ResNet网络识别验证码实战练习——高准确率的机器学习模型》

二值化处理后，得到如下的图形：

《ResNet网络识别验证码实战练习——高准确率的机器学习模型》

处理过程见前文。

图片resize

我们今天的项目就是基于处理后的验证码开始的。由于使用的是slim框架中的ResNet50网络，要求输入大小为 224×224×3，需要事先对图片大小进行统一修改，方便后面的操作。

import os
import numpy as np
from PIL import Image
 
 
# 验证码路径
captcha_path = r'F:\resnet_for_captcha\captcha4'
# 修改后图片存放路径
save_path = r'F:\resnet_for_captcha\1resize\resize_path'
for i in os.listdir(captcha_path):
    img = Image.open(os.path.join(captcha_path, i))
    arr = np.array(img) * 255       # 注意这里的np.array(img)是布尔值，之前二值化遗留下来的问题
    im = Image.fromarray(arr)
    im = im.resize((224, 224))
    arr = np.array(im)
    x = np.zeros([224, 224, 3])    # 创建一个224*224*3的矩阵
    for ii in range(224):
        for jj in range(224):
            x[ii, jj, :] = arr[ii, jj]
    im = Image.fromarray(x.astype('uint8'))     # 图片矩阵使用该格式
    im.save(os.path.join(save_path, i))

在这里需要注意几点：原来的图片是 114×450 大小的，要变成 224×224×3 大小，相当于多加了一个维度，在第三个维度上的值都是一样的；图片矩阵一定要改成 uint8 格式，表示像素点范围0-255之间。转换完成后就变成下图这样：

《ResNet网络识别验证码实战练习——高准确率的机器学习模型》

虽然部分拉伸了图片，但是对机器学习来说可以接受。

tfrecord格式

为了提高机器学习的效率，需要构建队列模式来减少机器读取数据的等待时间。TensorFlow提供了tfrecord格式文件，结合队列模式能够方便数据的读取操作。在resize验证码图片之后，再将其转换成tfrecord格式文件。

import os
import tensorflow as tf
from PIL import Image
import numpy as np
 
 
# 将验证码的名字转换成数组，one hot编码
def name2vec(name):
    vector = np.zeros(6 * 26)
    for i, c in enumerate(name):
        idx = i * 26 + ord(c) - 97
        vector[idx] = 1
    return vector
 
 
# 图片路径
cwd = r'F:\resnet_for_captcha\1resize\resize_path'
 
# 文件路径
file_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
 
# 存放图片个数
bestnum = 1000
 
# 第几个图片
num = 0
 
# 第几个TFRecord文件
recordfilenum = 0
 
# tfrecords格式文件名
ftrecordfilename = ("train.tfrecords-%.3d" % recordfilenum)
writer = tf.python_io.TFRecordWriter(os.path.join(file_path, ftrecordfilename))
 
for i in os.listdir(cwd):
    num += 1
    print(num)
    if num > bestnum:
        num = 1
        recordfilenum += 1
        ftrecordfilename = ("train.tfrecords-%.3d" % recordfilenum)
        writer = tf.python_io.TFRecordWriter(os.path.join(file_path, ftrecordfilename))
 
    name = os.path.splitext(i)[0]
    name_vec = name2vec(name).tobytes()         # 转成二进制格式
    img = Image.open(os.path.join(cwd, i), 'r')
    img_raw = img.tobytes()
    example = tf.train.Example(
        features=tf.train.Features(feature={
            'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[name_vec])),
            'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw])),
        }))
    writer.write(example.SerializeToString())
writer.close()

我们以每1000张图片制作一个tfrecord文件，我们把图片的像素矩阵和图片的标签都做成了二进制的数据流，存入tfrecord文件中。最后得到的tfrecord文件如下所示：

《ResNet网络识别验证码实战练习——高准确率的机器学习模型》

训练模型

做好了tfrecord文件后，接下来的任务就是搭建训练模型，其中包括读取tfrecord文件、构建队列模式、构建ResNet网络等。

首先定义读取tfrecord文件的函数：

def read_and_decode_tfrecord(filename):
    filename_deque = tf.train.string_input_producer(filename)
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_deque)
    features = tf.parse_single_example(serialized_example, features={
        'label': tf.FixedLenFeature([], tf.string),
        'img_raw': tf.FixedLenFeature([], tf.string)})
    label = tf.decode_raw(features['label'], tf.float64)
    label = tf.reshape(label, [6 * 26])
    label = tf.cast(label, tf.float32)
    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224, 224, 3])
    img = tf.cast(img, tf.float32) / 255.0    # 归一化
    return img, label

filename就是每个tfrecord的地址，因为我们是以二进制流存入tfrecord文件的，当从文件中读取原来的数据的时候，需要指定数据格式和大小。在这里我们对图片矩阵进行归一化。

接下来我们在主函数中定义队列和模型部分：

def main():
    save_dir = r"F:\resnet_for_captcha\3train\model\train.model"
    batch_size_ = 2
    lr = tf.Variable(0.0001, dtype=tf.float32)
    x = tf.placeholder(tf.float32, [None, 224, 224, 3])
    y_ = tf.placeholder(tf.float32, [None, 6 * 26])
 
    tfrecord_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
    train_list = []
    for file in os.listdir(tfrecord_path):
        train_list.append(os.path.join(tfrecord_path, file))
 
    min_after_dequeue = 1000
    # 随机打乱顺序
    img, label = read_and_decode_tfrecord(train_list)
    img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_,
                                                    capacity=min_after_dequeue + 3 * batch_size_,
                                                    min_after_dequeue=min_after_dequeue)
 
    pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
    pred = tf.reshape(pred, shape=[-1, 6 * 26])
    # 定义损失函数
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=y_))
    optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
    # 准确度
    predict = tf.reshape(pred, [-1, 6, 26])
    max_idx_p = tf.argmax(predict, 2)
    max_idx_l = tf.argmax(tf.reshape(y_, [-1, 6, 26]), 2)
    correct_pred = tf.equal(max_idx_p, max_idx_l)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
 
    saver = tf.train.Saver()
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        # saver.restore(sess, r'F:\resnet_for_captcha\3train\model\train.model-60000')
        # 创建一个协调器，管理线程
        coord = tf.train.Coordinator()
        # 启动QueueRunner,此时文件名队列已经进队
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
        i = 60000
        cycle_num = 0
        while True:
            i += 1
            b_image, b_label = sess.run([img_batch, label_batch])
            _, loss_ = sess.run([optimizer, loss], feed_dict={x: b_image, y_: b_label})
            if i % 20 == 0:
                print('step: {}, loss: {}'.format(i, loss_))
            if i % 100 == 0:
                _loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label})
                print('--------------------------------------------------------')
                print('step: {}  train_acc: {}  loss: {}'.format(i, acc_train, _loss))
                print('--------------------------------------------------------')
            if i % 20000 == 0:
                saver.save(sess, save_dir, global_step=i)
                cycle_num += 1
            if cycle_num == 10:
                break
        coord.request_stop()
        # 其他所有线程关闭之后，这一函数才能返回
        coord.join(threads)

构建队列使用的是随机顺序队列，使用 resnet_v2_50 模型。在这里使用了线程来管理队列，每20步输出一次loss值，每100步输出一次准确率，每20000步保存一次模型。这里选择 batch_size_ = 2 ，个人测试后觉得不错的经验值。最后，按照整个流程训练了14万步停止。

这部分完整代码如下：

import os
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
import datetime
 
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
 
 
def read_and_decode_tfrecord(filename):
    filename_deque = tf.train.string_input_producer(filename)
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_deque)
    features = tf.parse_single_example(serialized_example, features={
        'label': tf.FixedLenFeature([], tf.string),
        'img_raw': tf.FixedLenFeature([], tf.string)})
    label = tf.decode_raw(features['label'], tf.float64)
    label = tf.reshape(label, [6 * 26])
    label = tf.cast(label, tf.float32)
    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224, 224, 3])
    img = tf.cast(img, tf.float32) / 255.0    # 归一化
    return img, label
 
 
def main():
    save_dir = r"F:\resnet_for_captcha\3train\model\train.model"
    batch_size_ = 2
    lr = tf.Variable(0.0001, dtype=tf.float32)
    x = tf.placeholder(tf.float32, [None, 224, 224, 3])
    y_ = tf.placeholder(tf.float32, [None, 6 * 26])
 
    tfrecord_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
    train_list = []
    for file in os.listdir(tfrecord_path):
        train_list.append(os.path.join(tfrecord_path, file))
 
    min_after_dequeue = 1000
    # 随机打乱顺序
    img, label = read_and_decode_tfrecord(train_list)
    img_batch, label_batch = tf.train.shuffle_batch([img, label], num_threads=2, batch_size=batch_size_,
                                                    capacity=min_after_dequeue + 3 * batch_size_,
                                                    min_after_dequeue=min_after_dequeue)
 
    pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
    pred = tf.reshape(pred, shape=[-1, 6 * 26])
    # 定义损失函数
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred, labels=y_))
    optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(loss)
    # 准确度
    predict = tf.reshape(pred, [-1, 6, 26])
    max_idx_p = tf.argmax(predict, 2)
    max_idx_l = tf.argmax(tf.reshape(y_, [-1, 6, 26]), 2)
    correct_pred = tf.equal(max_idx_p, max_idx_l)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
 
    saver = tf.train.Saver()
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())
        # saver.restore(sess, r'F:\resnet_for_captcha\3train\model\train.model-60000')
        # 创建一个协调器，管理线程
        coord = tf.train.Coordinator()
        # 启动QueueRunner,此时文件名队列已经进队
        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
        i = 60000
        cycle_num = 0
        while True:
            i += 1
            b_image, b_label = sess.run([img_batch, label_batch])
            _, loss_ = sess.run([optimizer, loss], feed_dict={x: b_image, y_: b_label})
            if i % 20 == 0:
                print('step: {}, loss: {}'.format(i, loss_))
            if i % 100 == 0:
                _loss, acc_train = sess.run([loss, accuracy], feed_dict={x: b_image, y_: b_label})
                print('--------------------------------------------------------')
                print('step: {}  train_acc: {}  loss: {}'.format(i, acc_train, _loss))
                print('--------------------------------------------------------')
            if i % 20000 == 0:
                saver.save(sess, save_dir, global_step=i)
                cycle_num += 1
            if cycle_num == 10:
                break
        coord.request_stop()
        # 其他所有线程关闭之后，这一函数才能返回
        coord.join(threads)
 
 
if __name__ == '__main__':
    # 运行时间
    starttime = datetime.datetime.now().timestamp()
    main()
    endtime = datetime.datetime.now().timestamp()
    print(starttime)
    print(endtime)
    run_hour = (endtime - starttime) / 3600
    print('共运行{}小时！'.format(run_hour))

注：加上 config = tf.ConfigProto() 和 config.gpu_options.allow_growth = True 后能够提高GPU的利用率，最后我们计算程序运行的时间。

预测模型

我们分别搭建两个预测模型来展开这部分：一个模型来预测咱们训练的3430张图片，看模型学习后的准确率如何；另一个模型来预测样本外的10张验证码图片，来检测我们模型的泛化能力。

预测训练集

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
import os
import numpy as np
 
 
def read_and_decode_tfrecord(filename):
    filename_deque = tf.train.string_input_producer(filename)
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_deque)
    features = tf.parse_single_example(serialized_example, features={
        'label': tf.FixedLenFeature([], tf.string),
        'img_raw': tf.FixedLenFeature([], tf.string)})
    label = tf.decode_raw(features['label'], tf.float64)
    label = tf.reshape(label, [6 * 26])
    label = tf.cast(label, tf.float32)
    img = tf.decode_raw(features['img_raw'], tf.uint8)
    img = tf.reshape(img, [224, 224, 3])
    img = tf.cast(img, tf.float32) / 255.0      # 归一化
    return img, label
 
 
def vec2name(vec):
    name = []
    for i in vec:
        a = chr(i + 97)
        name.append(a)
    return "".join(name)
 
 
model_dir = r'F:\resnet_for_captcha\3train\model\train.model-140000'
tfrecord_path = r'F:\resnet_for_captcha\2to_tfrecord\tfrecord'
 
train_list = []
for file in os.listdir(tfrecord_path):
    train_list.append(os.path.join(tfrecord_path, file))
 
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
y_ = tf.placeholder(tf.float32, [None, 6 * 26])
batch_size_ = 1
min_after_dequeue = 1000
# 顺序读取
img, label = read_and_decode_tfrecord(train_list)
img_batch, label_batch = tf.train.batch([img, label], num_threads=2, batch_size=batch_size_,
                                                capacity=min_after_dequeue + 3 * batch_size_)
 
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
predict = tf.reshape(pred, [-1, 6, 26])
max_idx_p = tf.argmax(predict, 2)
 
saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.restore(sess, model_dir)
    coord = tf.train.Coordinator()
    # 启动QueueRunner,此时文件名队列已经进队
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    nn = 0
    count_true_num = 0
    count_false_num = 0
    while True:
        nn += 1
        b_image, b_label = sess.run([img_batch, label_batch])
        pre_index = sess.run(max_idx_p, feed_dict={x: b_image})
        vec = pre_index[0].tolist()
        predict_text = vec2name(vec)
        max_idx_l = np.argmax(np.reshape(b_label, [-1, 6, 26]), 2)
        vec1 = max_idx_l[0].tolist()
        true_text = vec2name(vec1)
        print('{}  真实值：{}   预测值：{}'.format(nn, true_text, predict_text))
        if true_text == predict_text:
            count_true_num += 1
        else:
            count_false_num += 1
 
        if nn == 3430:
            break
    print('正确：{}  错误：{} 准确率：{}'.format(count_true_num, count_false_num,
                                       count_true_num / (count_true_num + count_false_num)))
    coord.request_stop()
    # 其他所有线程关闭之后，这一函数才能返回
    coord.join(threads)

因为是预测训练集，所以这里的队列不再是随机打乱顺序，而是使用了顺序读取 tf.train.batch ，最终按照14万步模型，训练模型的准确率为0.9915，如果进一步训练，准确率还会更高。

预测测试集

对于测试集，我提供了10张样本外的验证码图片，由于测试集数量比较少，就不需要构建tfrecord文件。

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
from PIL import Image
import os
import numpy as np
 
 
# 向量转成标签名字
def vec2name(vec):
    name = []
    for i in vec:
        a = chr(i + 97)
        name.append(a)
    return "".join(name)
 
 
model_dir = r'F:\resnet_for_captcha\3train\model\train.model-140000'
x = tf.placeholder(tf.float32, [None, 224, 224, 3])
 
pred, end_points = nets.resnet_v2.resnet_v2_50(x, num_classes=6 * 26, is_training=True)
predict = tf.reshape(pred, [-1, 6, 26])
max_idx_p = tf.argmax(predict, 2)
saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.restore(sess, model_dir)
    test_dir = r'F:\resnet_for_captcha\test'
    for pic in os.listdir(test_dir):
        pic_path = os.path.join(test_dir, pic)
        img = Image.open(pic_path)
        arr = np.array(img) * 255
        im = Image.fromarray(arr)
        im = im.resize((224, 224))
        arr = np.array(im)
        xx = np.zeros([224, 224, 3])
        for ii in range(224):
            for jj in range(224):
                xx[ii, jj, :] = arr[ii, jj]
        img1 = Image.fromarray(xx.astype('uint8'))
        img2 = tf.reshape(img1, [1, 224, 224, 3])
        img3 = tf.cast(img2, tf.float32) / 255.0
 
        name = os.path.splitext(pic)[0]
 
        b_image = sess.run(img3)
        t_label = sess.run(max_idx_p, feed_dict={x: b_image})
        vec = t_label[0].tolist()
        predict_text = vec2name(vec)
        print('真实值：{}   预测值：{}'.format(name, predict_text))

得到的结果如下：

《ResNet网络识别验证码实战练习——高准确率的机器学习模型》

咱们实现了100%的准确率！！！比之前的预测结果高了很多。

总结

本文介绍了如何处理验证码图片、制作tfrecord格式文件、构建训练和预测模型。使用ResNet网络识别验证码图片，实现了非常高的准确率。可见ResNet网络的普适性和效果都是非常好的。本文代码放在我的github，所有代码和模型全部放到这里，密码：f3k7。涉及到细节的部分本文并没有作过多的介绍，默认你有一定的TensorFlow基础，如有问题请在下面留言。