Faster R-CNN

网络架构

Network structure

可以分为3个部分

  • Head
  • Region Proposal Network(RPN)
  • Classification Network
Working flow
Network architecture

Region Proposal Network

从CNN网络(head)中获取的feature map作为输入,经过一个3\times 3的2D卷积后分成两路。1)一路是把Anchor Generation Layer产生的一系列anchor box进行二分(background/foreground class scores and probabilities)。2)一路产生回归系数(regression coefficients)对对应的anchor box进行精修。这两路输出后续送给proposal layer进行处理。

Region Proposal Network

训练

训练每一层参数,这些层可以大体分为:

  • Anchor Generation Layer
  • Region Proposal Layer
    • Proposal Layer
    • Anchor Target Layer
    • Proposal Target Layer
  • ROI Pooling Layer
  • Classification Layer
Faster R-CNN layers

Anchor Generation Layer

生成大小、长宽比例不同的一系列固定数目的矩形框(anchor boxes)。然后送到卷积网络判断哪些矩形框是有物体(positive anchor),哪些没有物体(negative anchor)。这种判断是一个二分问题。

Anchor Generation Layer: This layer generates a fixed number of “anchors” (bounding boxes) by first generating 9 anchors of different scales and aspect ratios and then replicating these anchors by translating them across uniformly spaced grid points spanning the input image.

Anchor boxes generation.

Region Proposal Layer

可以分为3个部分:

  • Proposal Layer
  • Anchor Target Layer
  • Proposal Target Layer

主要功能是:

  • anchor分类:把给定一系列的anchor分成有物体(positive anchor)的,没有物体(negative anchor)的。
  • 精修:用回归到方法对anchor box进行精修(调节大小、位置),得到更精确的矩形框。

Proposal Layer

Proposal Layer的目标是从Region Proposal Network中选出一些精修后的anchor box作为proposal。

Proposal Layer: Transform the anchors according to the bounding box regression coefficients to generate transformed anchors. Then prune the number of anchors by applying non-maximum suppression (see Appendix) using the probability of an anchor being a foreground region

Proposal Layer

Anchor Target Layer

作用是选出可以用来训练RPN的anchor box,计算RPN loss。RPN loss分为两部分:1)Classification Loss;2)Bounding Box Regression Loss。注意的是,不是所有anchor box都进行训练,而是选出一部分得分高的anchor box进行训练。对一个无效的anchor box进行训练毫无意义。

Anchor Target Layer: The goal of the anchor target layer is to produce a set of “good” anchors and the corresponding foreground/background labels and target regression coefficients to train the Region Proposal Network. The output of this layer is only used to train the RPN network and is not used by the classification layer. Given a set of anchors (produced by the anchor generation layer, the anchor target layer identifies promising foreground and background anchors. Promising foreground anchors are those whose overlap with some ground truth box is higher than a threshold. Background boxes are those whose overlap with any ground truth box is lower than a threshold. The anchor target layer also outputs a set of bounding box regressors i.e., a measure of how far each anchor target is from the closest bounding box. These regressors only make sense for the foreground boxes as there is no notion of “closest bounding box” for a background box.

Input:

  • RPN Network Outputs (predicted foreground/background class labels, regression coefficients)
  • Anchor boxes (generated by the anchor generation layer)
  • Ground truth boxes

Output

  • Good foreground/background boxes and associated class labels
  • Target regression coefficients

Parameters:

  • TRAIN.RPN_POSITIVE_OVERLAP: Threshold used to select if an anchor box is a good foreground box (Default: 0.7)
  • TRAIN.RPN_NEGATIVE_OVERLAP: If the max overlap of a anchor from a ground truth box is lower than this thershold, it is marked as background. Boxes whose overlap is > than RPN_NEGATIVE_OVERLAP but < RPN_POSITIVE_OVERLAP are marked “don’t care”.(Default: 0.3)
  • TRAIN.RPN_BATCHSIZE: Total number of background and foreground anchors (default: 256)
  • TRAIN.RPN_FG_FRACTION: fraction of the batch size that is foreground anchors (default: 0.5). If the number of foreground anchors found is larger than TRAIN.RPN_BATCHSIZE \times TRAIN.RPN_FG_FRACTION, the excess (indices are selected randomly) is marked “don’t care”.

Proposal Target Layer

从Proposal Layer输出的一系列ROIs中选出符合条件的ROIs。这些选出的ROIs进行crop pooling后和feature maps结合。同Anchor Target Layer一样,不是对所有ROIs进行训练,而是挑出一部分得分高的。因为对一个不可能是proposal的ROI训练毫无意义。

Proposal Target Layer: The goal of the proposal target layer is to prune the list of anchors produced by the proposal layer and produce class specific bounding box regression targets that can be used to train the classification layer to produce good class labels and regression targets

The proposal target layer starts with the ROIs computed by the proposal layer. Using the max overlap of each ROI with all ground truth boxes, it categorizes the ROIs into background and foreground ROIs. Foreground ROIs are those for which max overlap exceeds a threshold (TRAIN.FG_THRESH, default: 0.5). Background ROIs are those whose max overlap falls between TRAIN.BG_THRESH_LO and TRAIN.BG_THRESH_HI (default 0.1, 0.5 respectively). This is an example of “hard negative mining” used to present difficult background examples to the classifier. The bbox_inside_weights array acts as a mask. It is 1 only for the correct class for each foreground ROI. It is zero for the background ROIs as well. Thus, while computing the bounding box regression component of the classification layer loss, only the regression coefficients for the foreground regions are taken into account. This is not the case for the classification loss – the background ROIs are included as well as they belong to the “background” class.

Input:

  • ROIs produced by the proposal layer
  • ground truth information

Output:

  • Selected foreground and background ROIs that meet overlap criteria.
  • Class specific target regression coefficients for the ROIs

Parameters:

  • TRAIN.FG_THRESH: (default: 0.5) Used to select foreground ROIs. ROIs whose max overlap with a ground truth box exceeds FG_THRESH are marked foreground
  • TRAIN.BG_THRESH_HI: (default 0.5)
  • TRAIN.BG_THRESH_LO: (default 0.1) These two thresholds are used to select background ROIs. ROIs whose max overlap falls between BG_THRESH_HI and BG_THRESH_LO are marked background
  • TRAIN.BATCH_SIZE: (default 128) Maximum number of foreground and background boxes selected.
  • TRAIN.FG_FRACTION: (default 0.25). Number of foreground boxes can’t exceed > - BATCH_SIZE*FG_FRACTION

RoI Pooling

可适用不同大小的图片。

ROI Pooling Layer: Implements a spatial transformation network that samples the input feature map given the bounding box coordinates of the region proposals produced by the proposal target layer. These coordinates will generally not lie on integer boundaries, thus interpolation based sampling is required.

Classification Layer

和前面的对anchor的分类不同。anchor的分类是二分,而这里是输出属于哪个分类的概率,类别可以很多。

Classification Layer: The classification layer takes the output feature maps produced by the ROI Pooling Layer and passes them through a series of convolutional layers. The output is fed through two fully connected layers. The first layer produces the class probability distribution for each region proposal and the second layer produces a set of class specific bounding box regressors.

Classification Layer
Classification Layer

Appendix

Bounding Box Regression Coefficients

TO分别代表“目标”和“原始”参数,x,y,w,h分别代表x,y坐标和宽、高。可以看出系数具有仿射(affine transformation)不变性。这一点很重要,因为在计算classification loss的时候,系数是根据原始参数中计算的,期望输出的系数是针对ROI pooling后的。

Note the regression coefficients are invariant to an affine transformation with no shear. This is an important point as while calculating the classification loss, the target regression coefficients are calculated in the original aspect ratio while the classification network output regression coefficients are calculated after the ROI pooling step on square feature maps (1:1 aspect ratio).

Intersection over Union (IoU) Overlap

IoU
IoU

Non-Maximum Suppression

有多个候选框,用非极大值抑制NMS,来抑制那些冗余的框: 抑制的过程是一个迭代-遍历-消除的过程。

  • 将所有框的得分排序,选中最高分及其对应的框;
  • 遍历其余的框,如果和当前最高分框的重叠面积(IOU)大于一定阈值,我们就将框删除;
  • 从未处理的框中继续选一个得分最高的,重复上述过程,直至边界框列表为空。
NMS.
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容