人脸自收集数据集辅助制作工具——人脸遮挡数据标注

综述

我们在进行人脸属性识别深度学习算法研究过程中除了使用开源带标签的数据以外，都会根据具体使用场景与需求用到大量自收集的图像数据（开源/爬虫/自拍等），然这些数据一般是没有人脸对应属性标注标签的。而我们在研究人脸各种检测算法时最终训练需要的数据就是图像+标签，所以如何快速标注这些特定数据便是数据收集工作的重点。本文主要讲一下如何通过python工具辅助标注人脸遮挡数据，在此做一个分享。

标注目标确定

待标注图片：带有人脸的照片（单人脸/人脸区域在整个图像的占比足够多/各种场景下包括遮挡和正常的人脸）
标注属性：人脸7个主要区域遮挡标注，0为未遮挡，1为遮挡（如下图所示）

人脸遮挡区域划分
标签文件：txt文本
标注文本格式：

图片文件相对路径 左眼遮挡标志 右眼遮挡标志 鼻子遮挡标志 嘴巴遮挡标志 下巴遮挡标志 左脸遮挡标志 右脸遮挡标志
}

数据命名规范：图片文件根目录与标签文件同名（除后缀名以外）

辅助工具开发所需的关键技术

去坐标图像显示

实现功能：将图像正常显示在一个控件内，去除各种干扰显示
关键代码：

    # 显示待标记图片
    im = Image.open(img_path)
    plt.imshow(im)
    plt.xticks([])  # 去掉横坐标值
    plt.yticks([])  # 去掉纵坐标值
    plt.axis('off')
    plt.gca().xaxis.set_major_locator(plt.NullLocator())
    plt.gca().yaxis.set_major_locator(plt.NullLocator())
    plt.subplots_adjust(top=1, bottom=0, right=1, left=0, hspace=0, wspace=0)
    plt.margins(0, 0)
    plt.show()

待标注图像遍历处理

实现功能：遍历待标注图片，并逐一进行显示和标注操作
关键代码：

def MarkToolWithImg(wait_mark_image_root_path, output_label_txt_path):
    """
        根据图片标注并生成标签文件
        :param wait_mark_image_root_path: 待标记图片根目录路径
        :param output_label_txt_path:  输出标签路径
        :return:
    """
    for parent, dirnames, filenames in os.walk(wait_mark_image_root_path):
        for filename in filenames:
            img_path = os.path.join(parent, filename)
            # 消重
            f = open('Face_data_mark.log', 'rb')
            a = f.readlines()
            matchObj = re.search(filename, "%s" % a, re.M | re.I)
            if matchObj:
                print(img_path + " 已标记过")
            else:
                print("正在标记：" + str(img_path))
                imgShowAndMark(img_path, output_label_txt_path, None)

def MarkToolWithTxt(label_txt_path):
    """
        根据标签文件定位图片并更新标注
        :param label_txt_path:  已标注标签路径
        :return:
    """
    num_of_line = 1
    with open(label_txt_path, 'r') as tt:
        while True:
            line = tt.readline()
            num = list(map(str, line.strip().split()))
            img_path = num.__getitem__(0)
            # 消重
            f = open('Face_data_mark.log', 'r')
            a = f.readlines()
            current_img_name = os.path.basename(img_path)
            print(current_img_name)
            is_mark = False
            for c in a:
                if current_img_name.strip() in c:
                # if matchObj:
                    print(img_path + " 已标记过")
                    is_mark = True
            if is_mark is False:
                print("正在标记：" + str(img_path))
                imgShowAndMark(img_path, label_txt_path, num_of_line)

            num_of_line += 1

待标注图像人脸遮挡情况动态设置与保存

实现功能：判断待标注图片中人脸的遮挡情况，并可以通过手动输入标注七个区域的遮挡与否，并保存到标签文件中。
关键代码：

if relaceLineNum != None:
    # 自定义输入属性
    lines = []
    with open(label_txt_path, 'r') as fr:
        linelist = fr.readlines()
        fr.close()
    for l in linelist:
        lines.append(l)
    newLine = myInput("请输入当前图片7遮挡属性，已标记属性为：" + str(lines[relaceLineNum - 1]))
    print("你输入的属性为，{}！".format(newLine).strip())
    # 替换对应标签文件指定行
    replaceLine(label_txt_path, relaceLineNum, str(img_path) + " " + newLine)
    else:
    newLine = myInput("请输入当前图片7遮挡属性")
    print("你输入的属性为，{}！".format(newLine).strip())
    with open(label_txt_path, "a") as ot:
        ot.write(newLine)
        ot.close()
    logging.info("已标记：" + str(img_path) + " 属性为：" + str(newLine).strip())

标注工具完整工程地址

OcclusionAnnotation

工具使用

wait_label_img

label_info

至此，我们的人脸遮挡标注工具便开发完成了，完美解决了人脸遮挡信息标注难的问题，极大提升了标注工作的效率，不知各位大佬是否还有其他更好的方法，欢迎评论区交流讨论。

人脸自收集数据集辅助制作工具——人脸遮挡数据标注

人脸自收集数据集辅助制作工具——人脸遮挡数据标注

综述

标注目标确定

辅助工具开发所需的关键技术

去坐标图像显示

待标注图像遍历处理

待标注图像人脸遮挡情况动态设置与保存

标注工具完整工程地址

工具使用

相关阅读更多精彩内容

友情链接更多精彩内容