FSOD论文翻译

image.png

图4.我们的网络架构使用ResNet-50作为骨干。 支持图像(绿色)和查询图像(蓝色)被送入重量共享的主干。 RPN使用关注特征,该特征由紧凑1×1×C支持特征和H×W×C查询特征之间的深度互相关生成。 将由补丁关系头(顶部头),全局关系头(中间头)和局部相关头(下头)生成的类得分相加,作为最终匹配得分,并生成边界框预测 通过补丁关系头。

In an R-CNN framework, an RPN module will be followed by a detector which takes an important role of re-scoring the proposals and class recognition. Therefore, we want a detector to have a strong discriminative ability to distinguish different categories. To this aim, we propose a novel multi-relation detector to effectively measure the similar-ity between proposal boxes from the query and the support objects. The detector includes three attention modules, which are the patch-relation head to learn a deep non-linear metric for patch matching, the global-relation head to learn a deep embedding for global matching and the local-correlation head to learn the pixel-wise and depthwise cross correlation between support and query proposals. We experimentally show that the three matching modules can complement each other and gains higher performance incrementally by adding one by one. We will introduce our multi-relation detector details below.
在R-CNN框架中,RPN模块后面将是检测器,该检测器将在对proposals进行重新评估和分类识别方面起重要作用。 因此,我们希望检测器具有很强的区分不同类别的能力。 为此,我们提出了一种新的多重关系检测器,可以有效地测量来自query和support对象的proposal框之间的相似性。 检测器包括三个注意模块,分别是用于学习深度非线性度量以进行区块匹配的patch relation端,用于学习深度匹配的深度嵌入的全局关系端以及用于学习像素匹配的局部相关端。 support和query建议之间的明智和深度互相关。 我们通过实验表明,三个匹配模块可以相互补充,并且通过逐个添加模块来逐步获得更高的性能。 我们将在下面介绍多关系检测器的详细信息。

image.png

Figure 5. Attention RPN. The support feature is average pooled to a 1×1×C vector, and then caculate depth-wise cross correlation with the query feature whose output is used as attention feature and is fed into RPN to generate proposals.

• In patch-relation head, we first concatenate the support and query proposal feature maps in depth. Then the combined feature map are fed into the patch-relation module, whose structure is shown in Table. 2. All the convolution and pooling layers in this module have 0 padding to reduce the feature map from 7 × 7 to 1 × 1 which is used as inputs for the binary classification and regression heads. This module is compact and efficient. We do a bit exploitation on the structure of the model and we find replacing the two average pooling with convolutions would not improve our model further.

• The global-relation head extends the patch relation to model the global-embedding relation between the support and query proposals. Given a concatenated feature of support and its query proposal, we average pooling the feature to a vector with a size of 1 × 1 × 2C. We then use an MLP with two fully connected (fc) layers followed by ReLU and a final fc layer to generate matching scores.
•全局关系端扩展了区域关系,以对support和query之间的全局嵌入关系进行建模。对给定的support和query的拼接特征,我们将特征平均池化为一个大小为1×1×2C的向量中。 然后,我们使用具有两个全连接(fc)层、后跟ReLU的MLP模块以及最后面的一个fc层来生成匹配分数。
• Local-correlation head computes the pixel-wise and depth-wise similarity between object ROI feature and the proposal feature, like that in Equ. 1. Different from Equ. 1, we perform dot product on feature pair on the same depth. In particular, we first use a weight-shared 1×1×C convolution to process support and query features individually. They then calculate the depth-wise similarity feature of size 1 × 1 ×C. Finally, a successive fc layer is used to generate matching scores.
•局部相关端像公式1中一样,计算对象ROI特征和目标特征之间的像素方向和深度方向的相似度。与等式1不同,我们在相同的深度上对特征对执行点积。 特别地,我们首先使用共享权重的1×1×C卷积分别处理support和query特征。 然后,他们计算尺寸为1×1×C的深度相似特征。 最后,连续的fc层用于生成匹配分数。
We only use the patch-relation head to generate bounding box predictions, i.e. regression on box coordinates, and use the sum of all matching scores from the three heads as the final matching scores. The intra-class variance and imperfect proposals make the relation between proposals and support objects complex. Our three relation heads contain different attributes and can well handle the complex, where the patch-relation head can generate flexible embedding that be able to match intra-class variances, global-relation head is a stable and general matching, and local-relation patch requires matching on parts.

Training Details

The model is end-to-end trained based on 4 Tesla P40 GPUs using SGD with a weight decay of 0.0001 and momentum of 0.9. The learning rate is 0.002 for the first 56000 iterations, and 0.0002 for the later 4000 iterations. We take the advantage of a pretrained model with its backbone, i.e. ResNet50, trained on [14, 9]. As our test set has no overlap with the datasets, it is safe to use it. During our training, we find that more training iterations will damage performance. We suppose that too many training iterations make model over-fitting on the training set. We fix Res1-3 blocks and only train the high-level layers, which can utilize lowlevel basic feature and avoid over-fitting. The query image is resized to shorter edge to 600 pixels and its max size of the longer edge is restricted to 1000. The support image is cropped around the target object with 16-pixels image context and is resized and zero-padded to a square image of 320x320. For few-shot training and testing, we fuse feature by averaging the object features and then fed them to the RPN attention module and the multi-relation detector.
该模型使用4路Tesla P40进行的端到端训练,使用SGD,其权重衰减为0.0001,动量为0.9。前56000次迭代的学习率为0.002,而后4000次迭代的学习率为0.0002。我们利用预训练模型的骨干即ResNet50进行训练,该模型在[14,9]上进行了训练。由于我们的测试集与数据集没有重叠,因此可以安全地使用它。在我们的训练期间,我们发现更多的训练迭代将损害性能。我们认为,太多的训练迭代使模型过度拟合训练集。我们固定Res1-3层,仅训练高层,这可以利用底层的基本特征并避免过拟合。将query图像的短边大小调整为600像素,并且将长边的最大大小限制为1000。support图像在目标对象周围以16像素的图像上下文进行裁剪,并调整大小并零填充为正方形图像320x320。对于少量训练和测试,我们通过对对象特征求平均值来融合特征,然后将其送到RPN注意力模块和多关系检测器。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,490评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,581评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,830评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,957评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,974评论 6 393
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,754评论 1 307
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,464评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,357评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,847评论 1 317
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,995评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,137评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,819评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,482评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,023评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,149评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,409评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,086评论 2 355

推荐阅读更多精彩内容