SA-Siam:A Twofold Siamese Network for Real-Time Object Tracking

标题:A Twofold Siamese Network for Real-Time Object Tracking

作者:Anfeng He, Chong Luo, Xinmei Tian, Wenjun Zeng.

出处:CVPR2018

领域:单目标跟踪

【paper】

【code】: 尝试复现论文效果,项目doing。欢迎讨论和交流。


new iders. 两个siameseFC,channel attention

why work? 1、deep representation combine(utilize heterogeneous features)(比它的baseline siameseFC效果好的最主要原因);2、大量的训练数据,ImageNet;3、large search regions;


Abstract:

        作者发现:图像分类任务的语义特征Semantic features,图像相似性匹配的表观特征Appearance feature,具有互补的性质。两个分支S_SiameseNet和A-SiameseNet都是基于siameseFC结构,分开训练。其中A-Net和SiameFC基本相似;S-Net中使用了通道注意力机制。


1. Introduction

        The key to design a high-performance tracker is to find expressive features and corresponding calssifiers that are simultaneously discriminative and generalizedBeing discriminative allows the tracker to differentiate the true target from the cluttered or even deceptive background. Being generalized means that a tracker would tolerate the appearance changes of the tracked object, even when the object is not known a priori.

        跟踪算法的判别能力:能够将目标从复杂(杂斑、欺骗性的)背景中区分出来;

        跟踪算法的泛化能力:能够应对目标的表观变化。

        To siameFC, the generalization capability remains quite poor and it encounters difficulties when the target has significant appearance change. As a result, SiameFC still has a performance gap to the best online tracker. As a result, SiamFC still has a performance gap to the best online tracker.

        siameFC的泛化能力较差:当目标发生较大的表观变化时,就会漂移。所以论文的目的,improve siameFC的泛化能力generalization capability。

        It is widely understood that, in a deep CNN trained for image classification task, features from deeper layers contain stronger semantic information and is more invariant to object appearance changes. These semantic features are an ideal complement to the appearance features trained in a similarity learning problem

        大家都知道widely understood that,来自图像分类任务的预训练CNN的高层特征较强的语义信息,对目标表观变化具有不变性(当目标变形时,这个特征仍然代表这个目标)。

        For the semantic branch, we further propose a channel attention mechanism to achieve a minimum degree of target adaptation. The motivation is that different objects activate different sets of feature channels. We shall give higher weights to channels that play more important roles in tracking specific targets. This is realized by computing channel-wise weights based on the channel responses at the target object and in the surrounding context. This simplest form of target adaptation improves the discrimination power of the tracker.

        有些特征通道channel(注意是特征通道,而不是特征)对某些特定的跟踪目标是很有用的,而另一些对该跟踪目标的基本没什么作用;所以应该give higher weights to channels that play more important roles in tracking specific targets. 

        小结:

        1、SiameFC有一个不足,就是当目标表观发生极大变化,容易跟丢。而目标的语义特征对目标的表观变化具有不变性。两者结合可以互补。

        2、不同特征通道,对特定的跟踪目标的判别能力不同。有些特征通道对于跟踪某些目标很重要,而有些通道对跟踪这些目标基本不起作用。


2. Related Work

2.1. Siamese Network Based Trackers

        A notable advantage of this method is that it needs no or little online training. Thus, real-time tracking can be easily achieved.

        The advantage of a fullyconvolutional network is that, instead of a candidate patch of the same size of the target patch, one can provide as input to the network a much larger search image and it will compute the similarity at all translated sub-windows on a dense grid in a single evaluation.

        Significantly better performance is achieved without much speed drop.

        SA-Siam inherits network architecture from SiamFC. We intend to improve SiamFC with an innovative way to utilize heterogeneous features.

2.2. Ensemble Trackers

        A common insight of these ensemble trackers is that it is possible to make a strong tracker by utilizing different layers of CNN features. Besides, the correlation across models should be weak. In SA-Siam design, the appearance branch and the semantic branch use features at very different abstraction levels. Besides, they are not jointly trained to avoid becoming homogeneous.

2.3. Adaptive Feature Selection

        不同特征对不同的跟踪目标的不同的影响,使用单一对象跟踪的所有特性既不高效也不有效。Recently, SENet demonstrates the effectiveness of channel-wise attention on image recognition tasks。

        In our SA-Siam network, we perform channel-wise attention based on the channel activations. It can be looked on as a type of target adaptation, which potentially improves the tracking performance.


3. Our Approach

        The fundamental idea behind this design :相似性学习的表观特征和分类任务的语义特征具有互补性质。他们发现了。

3.1 SA-Sia Network Architecture

        The two branches are separately trained and not combined until testing time.

        The appearance branch

        类似于siameseFC.

        The semantic branch:

        pretrained CNN(ALexNet)、conv4/conv5、fusion module(1 X 1 ConvNet)、crop operation、attention module.

        we only train the fusion module and the channel attention module.

        During testing time

        按权重结合two branches产生的响应图。Similar to SiamFC,use multi-scale changes. find that using three scales strikes a good balance between performance and speed.

3.2 Channel Attension in Semantic Branch

        高层语义特征对目标的表观变化鲁棒,因此使跟踪算法more generalized,但是less discriminative,定位不准。为了提高semantic branch的discriminative power,设计了通道注意力机制。

        直观上,不同通道在跟踪不同目标中扮演不同的角色。一些通道对跟踪某些目标极其重要,但是在跟踪另一些目标时却是可有可无。If we could adapt the channel importance to the tracking target, we achieve the minimum functonality of target adaptation。In order to do so,不仅与目标有关,而且目标的背景区域也很重要。Therefore,the proposed attention module 的输入不是目标本身,而是包含背景信息比目标区域更大的区域。

        以conv5特征图为例。该特征图的大小是22X22。

        首先将特征图分为3X3网格,中间一块为6X6大小,与目标区域一样大。

        然后,在每个网格上做max pooling。

        再次,使用两层的多层感知机(MLP)为这个通道产生一个系数。

        最后,使用带有bias的sigmoid函数,生成最后的参数。

3.3. Discussions of Design Choices

        We separately train the two branches.

        We do not fine-tune S-Net.

        We keep A-Net as it is in SiameFC.

4. Experiments

4.1. Implementation Details

       Network structure:A-Net和SiamseFC的网络结构exactly一样。S-Net采用imageNet上预训练的AlexNet;对stride做一点小的改变,使S-Net的输出和A-Net有相同的大小。

        在注意力模块中,池化后的特征stack into 9维vector。The following MLP有一个有9个神经元的隐藏层,使用了ReLU非线性函数。最后在使用Sigmoid函数,使用的bias为0.5。this is to ensure that no channel will be suppressed to zero。

        Data dimensions:

        input:127*127*3、255*255*3。

        output:6*6*256、22*22*256.

        conv4:24*24*384.

        conv5:22*22*256.

        response maps :17*17.

        Training:

        ILSVRC-2015,只使用Color images。tensorflow。测试的平均速度是50fps.

        Hyperpatrameters:

        conbine weight = 0.3。 three scales。

4.2. Datasets and Evaluation Metrics

        OTB:

        VOT:

4.3.  Ablation Analysis

        The semantic branch and the appearance branch complement each other.

        Using multilevel features and channel attention bring gain.

        Separate vs. joint training.

4.4. Comparison with State-of-the-Arts

        OTB benchmarks.

        VOT2015 benchmark.

        VOT2016 benchmark.

        VOT2017 benchmark.

5. Conclusion

        In the feature, we plan to continue exploring the effective fusion of deep feature in object trcking task.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,335评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,895评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,766评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,918评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,042评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,169评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,219评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,976评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,393评论 1 304
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,711评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,876评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,562评论 4 336
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,193评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,903评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,142评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,699评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,764评论 2 351

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,316评论 0 10
  • 三月桃花早已凋落, 不是桃花的无情, 不是风雨的阻隔。 是我在等你, 你没有赴约! 孤芳自赏, 多的只是些失落。 ...
    海风中的栈桥阅读 231评论 5 4
  • 本周践行情况 早睡早起 写晨间日记 下雨没有外出早起运动 跟女儿聊天 解答作业一次 给父母打电话 看书 内容营销...
    赵祖田阅读 147评论 0 0
  • 正在看打印出来的自己写的小说的某娃。啊,好绕!
    wbsy001阅读 104评论 0 2
  • 读过很多书,看过什么视频,却依然浑浑噩噩,没有任何提升。放弃读一些情啊爱啊讲故事的书,看一些有趣但没营养的视频吧,...
    kwork1988阅读 117评论 0 0