[公式检测] 4 ScanSSD WINDOW-LEVEL DETECTION

最近在做公式检测的项目。从本文开始介绍一个用于扫描式公式检测的方法，我们将分为三个部分介绍。第一、解读论文。第二、开源代码详解。第三、在本地数据集上的复现。论文地址和代码地址分别为：论文地址，代码地址。

在上一篇文章中，文章地址：公式检测--ScanSSD摘要，我们简单介绍了这篇论文的摘要部分。在本文中，我们将继续对INTRODUCTION部分进行解读。

SCANSSD：WINDOW-LEVEL DETECTION

First, we use a sliding window to sample overlapping sub-images from the document page image. We then pass each window to a Single-Shot Detector (SSD [3]) to locate formula regions.

翻译：sample overlapping sub-images(采样重叠子图像)

解读：首先从文档页面图像中采样重叠子图像。然后再把每个子图像都通过SSD网络，进行公式区域的定位。

SSD simultaneously evaluates multiple formula regioncandidates laid out in a grid (see Figure 3), and then appliesnon-maximal suppression (NMS) to select the window-leveldetections.

翻译：simultaneously(同时)，laid out in a grid(网格)，non-maximal suppression(非极大值抑制)。

SSD同时计算网格中的多个公式候选区域，然后用非极大值抑制算法选择window-level detections。

解读：window-level detections的意思大概为窗口中的公式区域。

A. Sliding Windows

To produce sub-images for use in detection, starting from a 600 dpi page image we slide a 1200 × 1200 window with a vertical and horizontal stride (shift) of 120 pixels (10% of window size).

翻译：为了提取用于检测的子图像，我们用一个1200*1200的滑动窗口，以120像素为间隔，在原图上进行滑动。

解读：120像素为间隔的意思是，每个滑动窗口的大小为1200*1200，滑窗的间隔为120像素点。

Our windows are roughly 10 text lines in height, which makes math formulas large enough for SSD to detectthem reliably. The SSD detector is trained using ground truthmath regions cropped at the boundary of each window, afterscaling and translating formula bounding boxes appropriately.

翻译：cropped(不规则裁剪)。

在适当缩放和转换公式边界框之后，SSD通过公式的实际框(在每个窗口的边界处裁剪)训练。

解读：ground truth不是事先知道的吗？为什么还需要缩放和转换公式边界框。

Advantages. There are four main advantages to usingsliding windows.The first is data augmentation: only 569page images are available in the training set, which is verysmall for training a deep neural network. Our slidingwindows produce 656,717 sub-images.

翻译：augmentation(增强)。

解读：使用滑动窗口有四个优点，第一个优点是可以进行数据增强：原始的训练数据集只有569张，通过滑动窗口后，生成了656717张图片。

Second, convertingthe original page image directly to 300 × 300 or 512 × 512loses a great deal of visual information, and when we triedto detect formulas using subsampled page images recall wasextremely low.

翻译：visual information(视觉信息)，subsampled(下采样)。

第二个优点：当我们把原始图片直接缩放为300*300或者512*512后，会丢失很多的视觉信息。使用这种通过下采样获取的图片，会导致公式检测的召回率很低。

解读：下采样的意思是，如果一幅图片的尺寸为M*N，对其进行s倍下采样，可以得到(M/s)*(N/s)的图片。

Third, as we maintain the overlap betweenwindows, the network sees formulas multiple times, and hasmultiple chances to detect a formula. This helps increaserecall, because formulas appear in more regions of detectionwindows.

翻译：maintain(保持)。

第三个优点：由于我们保持了窗口之间的重叠，对于同一个公式，可能出现在不同的窗口，所以网络可以学习很多次，从而有很多次机会把公式检测出来。

Finally, Liu et al. [3] mention that SSD ischallenged when detecting small objects. Formulas with justone or two characters are common, but also small. Usinghigh-resolution sub-images increases the relative size ofmath regions, which makes it easier for SSD to detect them.

翻译：high-resolution(高分辨率)。

第四个优点：对于小的目标，用SSD检测可能有点挑战。有很多公式只有一两个字符。用高分辨率的子图像可以相对的增大公式区域，这有助于SSD的检测。

解读：用高分辨率的子图像可以相对的增大公式区域，感觉是因为公式出现在了很多窗口中，从而相对的增大公式区域。

Disadvantages. There are also a few disadvantages tousing sliding windows versus detection within a single pageimage. The first is increased computational cost; this can bemitigated through parallelization, as each window may beprocessed independently.

翻译：increased computational cost(增加计算成本)，mitigated through parallelization(通过并行化缓解)。

第一个缺点：增加计算成本(耗时)，这可以通过并行化缓解，因为每个窗口可以被独立处理。

解读：并行化处理图片应该可以降低计算耗时。

Secondly, windowing cuts formulasif they do not fit in a window. This means that a largeexpression may be split into multiple sub-images; this makesit impossible to train the SSD network to detect large mathexpressions directly. To mitigate this issue, we train thenetwork to detect formulas across windows. Furthermore,windowing requires that we stitch (combine) results fromindividual windows to obtain detection results at the level ofthe original page. We discuss how we address theseproblems using pooling methods in section V.

翻译：To mitigate this issue(为了缓解这个问题)。

第二个缺点：一个较大的公式可能被切分到不同的子窗口中。为了解决这个问题，我们需要训练模型通过跨窗口检测公式。此外，我们还需要把独立窗口的结果结合起来，从而获得页面级别的检测结果。

解读：训练模型通过跨窗口检测公式，如何操作呢？需要在下文中找答案。

B. Region Matching and Default Boxes in SSD

SSD defines a fixed space of candidate detection regionsorganized in a spatial grid at multiple resolutions (‘defaultboxes’).

翻译：暂时看不太懂

Each default box may be resized and translated bythe SSD network to fit target regions, and is associated witha confidence score.

翻译：default box(候选框)，target regions(目标区域)。

每个候选框可以被SSD网络resize and translated以便更靠近目标区域，并且和目标区域有一个置信度。

解读：resize可以理解为把调整候选框的形状，translated这个不太好理解。

Figure 1 shows default boxes of differentsizes and aspect ratios overlaid on a 512×512 image.

翻译：aspect ratios(纵横比)

图3显示了覆盖在512*512大小的图片上的不同尺寸和纵横比的候选框。

图1 Default boxes for a 512×512 window

In SSD,each feature map is a pixel grid, but the associated defaultboxes are defined in the original image coordinate space.

翻译：在SSD中，每个特征图都是一个像素网格，但是相关的候选框在原始图像坐标空间中。

Theimage is analyzed at multiple scales; here for illustration the32 × 32 grid of default boxes is shown. In practice, if weused only the 32 × 32 default boxes, we might miss smallerobjects.

翻译：illustration(插图)

候选框可以有很多种尺寸，图3中展示的是32*32这种尺寸的。在实际场景中，如果只用32*32这种尺寸的，我们可能会漏掉很多更小的目标。

Our metric for matching ground truth to candidatedetection regions is the same as SSD [3]. Each ground truthbox is matched to a default box with the highest IOU, andalso with default boxes with an IOU greater than 0.5.Matching targets to more than one default box simplifieslearning by allowing the network to predict higher scores formore boxes. The matched default boxes are consideredpositive examples (POS) and the remaining default boxes areconsidered negative examples (NEG).

翻译：大致意思是，我们用候选框去匹配真实框的策略和SSD差不多。如果IoU高于0.5就被认为是POS。

The original SSD [3] architecture uses aspect ratios(width/height) of {1, 2, 3, 1/2, 1/3}. However, as we see inFigure 4, there are many wide formulas with an aspect ratiogreater than 3 in the dataset. As a result, wider default boxeswill have a higher chance of matching wide formulas. So, inaddition to the default boxes used in the original SSD, wealso add the wider default boxes used in TextBoxes [29],with aspect ratios {5, 7, 10}. In our early experiments, thesewider default boxes increased recall for large formulas.

翻译：大致意思是，原始的SSD，候选框的纵横比为：{1, 2, 3, 1/2, 1/3}。在本文中，因为有些公式的纵横比较大，所以又增加了{5, 7, 10}这三种类型。

C. Postprocessing

Figure 2 illustrates postprocessing in ScanSSD. Weexpand and/or shrink initial formula detections so that arecropped around the connected components they contain andtouch at their border. The goal is to capture entire charactersbelonging to a detection region, without additional padding.

图2 After cropping

翻译：如图2中，我们会把检测框扩大或者收缩。目标是捕获属于检测区域的整个字符，而无需额外填充。

解读：如上图所示的0，一开始只有一部分在检测框内，调整后，全部处于了检测框内。

This postprocessing is done at two stages: first, beforestitching, and second, after pooling regions to obtain outputformula detections.

翻译：上面的处理，需要在把整张图片缝合起来之前，并且在合并区域获得输出公式之后进行。

解读：(1).同一个公式可能处于不同的滑动窗口，检测出滑动窗口的公式之后，需要把多个滑动窗口的公式合起来。(2).检测出每一个公式之后，需要把整张图片的公式都整合起来。那么上面的处理需要在(1)之后，(2)之前，进行。

最后编辑于：2021.06.17 11:04:09

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 219,589评论 6赞 508
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 93,615评论 3赞 396
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 165,933评论 0赞 356
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,976评论 1赞 295
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,999评论 6赞 393
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,775评论 1赞 307
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,474评论 3赞 420
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 39,359评论 0赞 276
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,854评论 1赞 317
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 38,007评论 3赞 338
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 40,146评论 1赞 351
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,826评论 5赞 346
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,484评论 3赞 331
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 32,029评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 33,153评论 1赞 272
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 48,420评论 3赞 373
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 45,107评论 2赞 356

[公式检测] 4 ScanSSD WINDOW-LEVEL DETECTION

SCANSSD：WINDOW-LEVEL DETECTION

A. Sliding Windows

B. Region Matching and Default Boxes in SSD

C. Postprocessing

推荐阅读更多精彩内容