Faster R-CNN

网络架构

Network structure

可以分为3个部分

Head
Region Proposal Network(RPN)
Classification Network

Working flow

Network architecture

Region Proposal Network

从CNN网络（head）中获取的feature map作为输入，经过一个 $3\times 3$ 的2D卷积后分成两路。1）一路是把Anchor Generation Layer产生的一系列anchor box进行二分（background/foreground class scores and probabilities）。2）一路产生回归系数（regression coefficients）对对应的anchor box进行精修。这两路输出后续送给proposal layer进行处理。

Region Proposal Network

训练

训练每一层参数，这些层可以大体分为：

Anchor Generation Layer
Region Proposal Layer
- Proposal Layer
- Anchor Target Layer
- Proposal Target Layer
ROI Pooling Layer
Classification Layer

Faster R-CNN layers

Anchor Generation Layer

生成大小、长宽比例不同的一系列固定数目的矩形框（anchor boxes）。然后送到卷积网络判断哪些矩形框是有物体（positive anchor），哪些没有物体（negative anchor）。这种判断是一个二分问题。

Anchor Generation Layer: This layer generates a fixed number of “anchors” (bounding boxes) by first generating 9 anchors of different scales and aspect ratios and then replicating these anchors by translating them across uniformly spaced grid points spanning the input image.

Anchor boxes generation.

Region Proposal Layer

可以分为3个部分：

Proposal Layer
Anchor Target Layer
Proposal Target Layer

主要功能是：

anchor分类：把给定一系列的anchor分成有物体（positive anchor）的，没有物体（negative anchor）的。
精修：用回归到方法对anchor box进行精修（调节大小、位置），得到更精确的矩形框。

Proposal Layer

Proposal Layer的目标是从Region Proposal Network中选出一些精修后的anchor box作为proposal。

Proposal Layer: Transform the anchors according to the bounding box regression coefficients to generate transformed anchors. Then prune the number of anchors by applying non-maximum suppression (see Appendix) using the probability of an anchor being a foreground region

Proposal Layer

Anchor Target Layer

作用是选出可以用来训练RPN的anchor box，计算RPN loss。RPN loss分为两部分：1）Classification Loss；2）Bounding Box Regression Loss。注意的是，不是所有anchor box都进行训练，而是选出一部分得分高的anchor box进行训练。对一个无效的anchor box进行训练毫无意义。

Anchor Target Layer: The goal of the anchor target layer is to produce a set of “good” anchors and the corresponding foreground/background labels and target regression coefficients to train the Region Proposal Network. The output of this layer is only used to train the RPN network and is not used by the classification layer. Given a set of anchors (produced by the anchor generation layer, the anchor target layer identifies promising foreground and background anchors. Promising foreground anchors are those whose overlap with some ground truth box is higher than a threshold. Background boxes are those whose overlap with any ground truth box is lower than a threshold. The anchor target layer also outputs a set of bounding box regressors i.e., a measure of how far each anchor target is from the closest bounding box. These regressors only make sense for the foreground boxes as there is no notion of “closest bounding box” for a background box.

Input:

RPN Network Outputs (predicted foreground/background class labels, regression coefficients)

Anchor boxes (generated by the anchor generation layer)

Ground truth boxes

Output

Good foreground/background boxes and associated class labels

Target regression coefficients

Parameters:

TRAIN.RPN_POSITIVE_OVERLAP: Threshold used to select if an anchor box is a good foreground box (Default: 0.7)

TRAIN.RPN_NEGATIVE_OVERLAP: If the max overlap of a anchor from a ground truth box is lower than this thershold, it is marked as background. Boxes whose overlap is > than RPN_NEGATIVE_OVERLAP but < RPN_POSITIVE_OVERLAP are marked “don’t care”.(Default: 0.3)

TRAIN.RPN_BATCHSIZE: Total number of background and foreground anchors (default: 256)

TRAIN.RPN_FG_FRACTION: fraction of the batch size that is foreground anchors (default: 0.5). If the number of foreground anchors found is larger than TRAIN.RPN_BATCHSIZE $\times$ TRAIN.RPN_FG_FRACTION, the excess (indices are selected randomly) is marked “don’t care”.

Proposal Target Layer

从Proposal Layer输出的一系列ROIs中选出符合条件的ROIs。这些选出的ROIs进行crop pooling后和feature maps结合。同Anchor Target Layer一样，不是对所有ROIs进行训练，而是挑出一部分得分高的。因为对一个不可能是proposal的ROI训练毫无意义。

Proposal Target Layer: The goal of the proposal target layer is to prune the list of anchors produced by the proposal layer and produce class specific bounding box regression targets that can be used to train the classification layer to produce good class labels and regression targets

The proposal target layer starts with the ROIs computed by the proposal layer. Using the max overlap of each ROI with all ground truth boxes, it categorizes the ROIs into background and foreground ROIs. Foreground ROIs are those for which max overlap exceeds a threshold (TRAIN.FG_THRESH, default: 0.5). Background ROIs are those whose max overlap falls between TRAIN.BG_THRESH_LO and TRAIN.BG_THRESH_HI (default 0.1, 0.5 respectively). This is an example of “hard negative mining” used to present difficult background examples to the classifier. The bbox_inside_weights array acts as a mask. It is 1 only for the correct class for each foreground ROI. It is zero for the background ROIs as well. Thus, while computing the bounding box regression component of the classification layer loss, only the regression coefficients for the foreground regions are taken into account. This is not the case for the classification loss – the background ROIs are included as well as they belong to the “background” class.

Input:

ROIs produced by the proposal layer

ground truth information

Output:

Selected foreground and background ROIs that meet overlap criteria.

Class specific target regression coefficients for the ROIs

Parameters:

TRAIN.FG_THRESH: (default: 0.5) Used to select foreground ROIs. ROIs whose max overlap with a ground truth box exceeds FG_THRESH are marked foreground

TRAIN.BG_THRESH_HI: (default 0.5)

TRAIN.BG_THRESH_LO: (default 0.1) These two thresholds are used to select background ROIs. ROIs whose max overlap falls between BG_THRESH_HI and BG_THRESH_LO are marked background

TRAIN.BATCH_SIZE: (default 128) Maximum number of foreground and background boxes selected.

TRAIN.FG_FRACTION: (default 0.25). Number of foreground boxes can’t exceed > - BATCH_SIZE*FG_FRACTION

RoI Pooling

可适用不同大小的图片。

ROI Pooling Layer: Implements a spatial transformation network that samples the input feature map given the bounding box coordinates of the region proposals produced by the proposal target layer. These coordinates will generally not lie on integer boundaries, thus interpolation based sampling is required.

Classification Layer

和前面的对anchor的分类不同。anchor的分类是二分，而这里是输出属于哪个分类的概率，类别可以很多。

Classification Layer: The classification layer takes the output feature maps produced by the ROI Pooling Layer and passes them through a series of convolutional layers. The output is fed through two fully connected layers. The first layer produces the class probability distribution for each region proposal and the second layer produces a set of class specific bounding box regressors.

Classification Layer

Appendix

Bounding Box Regression Coefficients

$T$ 和 $O$ 分别代表“目标”和“原始”参数， $x,y,w,h$ 分别代表x，y坐标和宽、高。可以看出系数具有仿射(affine transformation)不变性。这一点很重要，因为在计算classification loss的时候，系数是根据原始参数中计算的，期望输出的系数是针对ROI pooling后的。

Note the regression coefficients are invariant to an affine transformation with no shear. This is an important point as while calculating the classification loss, the target regression coefficients are calculated in the original aspect ratio while the classification network output regression coefficients are calculated after the ROI pooling step on square feature maps (1:1 aspect ratio).

Intersection over Union (IoU) Overlap

IoU

Non-Maximum Suppression

有多个候选框，用非极大值抑制NMS，来抑制那些冗余的框：抑制的过程是一个迭代-遍历-消除的过程。

将所有框的得分排序，选中最高分及其对应的框；
遍历其余的框，如果和当前最高分框的重叠面积(IOU)大于一定阈值，我们就将框删除；
从未处理的框中继续选一个得分最高的，重复上述过程，直至边界框列表为空。

NMS.

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 216,591评论 6赞 501
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,448评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 162,823评论 0赞 353
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,204评论 1赞 292
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,228评论 6赞 388
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,190评论 1赞 299
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,078评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,923评论 0赞 274
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,334评论 1赞 310
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,550评论 2赞 333
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,727评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,428评论 5赞 343
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,022评论 3赞 326
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,672评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,826评论 1赞 269
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,734评论 2赞 368
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,619评论 2赞 354

Faster R-CNN

网络架构

Region Proposal Network

训练

Anchor Generation Layer

Region Proposal Layer

Proposal Layer

Anchor Target Layer

Proposal Target Layer

RoI Pooling

Classification Layer

Appendix

Bounding Box Regression Coefficients

Intersection over Union (IoU) Overlap

Non-Maximum Suppression

推荐阅读更多精彩内容