PaperReading3MDNet

摘录自:
[Learning Multi-Domain Convolutional Neural Networks for Visual Tracking]
Author:Hyeonseob Nam, Bohyung Ham
参考:https://zhuanlan.zhihu.com/p/25312850

  • 跟踪的目标虽然各式各样但也有值得网络去学习的共性(such as robustness to illumination changes, motion blur, scale variations, etc. ),所以可以使用视频跟踪数据来训练CNN
  • 同一目标有时被当作背景有时被当作前景,每个序列的目标存在较大差异,再加上有遮挡和形变
  • 现有的很多训练好的网络主要针对的任务比如目标检测、分类、分割等的网络很大,因为他们要分出很多类别的目标。而在跟踪问题中,一个网络只需要分两类:目标和背景。而且目标一般都相对比较小,那么其实不需要这么大的网络,会增加计算负担。

Multi-domain representation learning(提前用CNN学共性)
with
online visual tracking(在线学特性)


Multi-domain representation learning part
separate domain-independent information from domain-specific one and learn generic feature representations for visual tracking.


MDNet的结构

  • Input: 网络的输入是107x107的Bounding box,设置为这个尺寸是为了在卷积层conv3能够得到3x3的feature map。
    This input size is designed to obtain 3×3 feature maps in conv3:
    107 = 75 (receptive field) + 2 × 16 (stride).
  • Convolutional layers: 网络的卷积层conv1-conv3来自于VGG-M [1]网络,只是输入的大小做了改变。
  • Fully connected layers: 接下来的两个全连接层fc4,fc5各有512个输出单元(fc4,fc5 combined with ReLUs and Dropouts)。fc6是一个二分类层(Domain-specific layers,binary classification layer with softmax crossentropy loss),一共有K个,对应K个Branches(即K个不同的视频),每次训练的时候只有对应该视频的fc6被使用,前面的层都是共享的。

训练过程
K个视频,N次循环
mini-batch:某一视频中随机采8帧图片,在这8帧图片上随机采32个正样本和96个负样本——>128个框
每次循环K次迭代(分别用K个视频来取mini-batch)
SGD
每个视频会对应自己的fc6层
通过这样的训练来学得各个视频中目标的共性
generic target representation in shared layer

补充:integrate hard negative mining step into minibatch selection
就是让负样本越来越难分,从而使得网络的判别能力越来越强。

  • hard negative mining
    就是要挑负样本里面最难分的哪些即positive scores最高的那些(特别是false positive)
each iteration of learning procedure
a minibatch->Mp个positives,Mn个hard negatives
Mn个hard negatives是怎么来的呢:
testing M(>>Mn)negatives选分数最高的Mn个

This approach examines a predefined number of
samples and identifies critical negative examples effectively
without explicitly running a detector to extract false positives as in the standard hard negative mining techniques.

Only the weights in the fully connected layers w4:6 are updated online whereas the ones in the convolutional layers w1:3 are fixed throughout tracking; this strategy is beneficial to not only computational efficiency but also avoiding overfitting by preserving domain-independent


训练好的网络在做test的时候,会新建一个fc6层,在线fine-tune fc4-fc6层,卷积层保持不变。


online visual tracking

  • 输入:
    每次新来一帧图片,以上一帧的目标位置为中心,用多维高斯分布(宽,高,尺度三个维度)的形式进行采样256个candidates,将他们大小统一为107x107后,分别作为网络的输入进行计算。

  • 输出:
    表示BoundingBox对应前景与背景的概率的二维向量


更新策略
The online update is conducted to model long-term and short-term appearance variations of a target for robustness and adaptiveness, respectively.(更新是为了建模目标的长期或短期的变化)
采用long-term和short-term两种更新方式。
在跟踪的过程中,会保存历史跟踪到的目标作为正样本(得分高于一个阈值)
long-term对应历史的100个样本(超过100个抛弃最早的),固定时间间隔做一次网络的更新(程序中设置为每8帧更新一次)
short-term对应20个(超过20个抛弃最早的),在目标得分低于0.5进行更新。负样本都是用short-term的方式收集的。


Bounding Box Regression
Due to the high-level abstraction of CNN-based features and our data augmentation strategy which samples multiple positive examples around the target (which will be described in more detail in the next subsection), our network sometimes fails to find tight bounding boxes enclosing the target.
最后得到的candidate不是直接作为目标,还要做一步bounding box regression。做法与R-CNN一样。

Given the first frame of a test sequence, we train a simple linear regression model to predict the precise target location using conv3 features of the samples near the target location. In the subsequent frames, we adjust the target locations estimated from Eq. (1) using
the regression model if the estimated targets are reliable
The bounding box regressor is trained only in the first frame since it is time consuming for online update and incremental learning of the regression model may not be very helpful considering its risk.


总结一下MDNet效果好的原因:

  • 用了CNN特征,并且是专门为了tracking设计的网络,用tracking的数据集做了训练
  • 有做在线的微调fine-tune,这一点虽然使得速度慢,但是对结果很重要
  • Candidates的采样同时也考虑到了尺度,使得对尺度变化的视频也相对鲁棒
  • Hard negative mining和bounding box regression这两个策略的使用,使得结果更加精确

采用比较浅层的网络的原因
1,visual tracking aims to distinguish only two classes, target and background, which requires much less model complexity
2,a deep CNN is less effective for precise target localization since the spatial information tends to be diluted as a network goes deeper(越深越抽象的意思?)
3,since targets in visual tracking are typically small, it is desirable
to make input size small, which reduces the depth of the network naturally.
4, a smaller network is obviously more efficient in visual tracking problem, where training and testing are performed online.


Online Tracking Algorithm


Implementation Detials

  • convolutional layers: initialized by VGG-M network(pretrained on ImageNet)
  • Network learning For multi-domain learning with K training sequences: train the network for 100K iterations with learning rates 0.0001 for convolutional layers2 and 0.001 for fully connected layers.
  • At the initial frame of a test sequence: train the fully connected layers for 30 iterations with learning rate 0.0001 for fc4-5 and 0.001
    for fc6.
  • online update: train the fully connected layers for 10 iterations with the learning rate three times larger than that in the initial frame for fast adaptation.
  • momentum: 0.9
  • weight decay: 0.0005
  • mini-batch consists: M+(= 32) positives and Mh−(= 96) hard negatives selected out of M−(=1024) negative examples.
  • offline multi-domain learning: collect 50 positive and 200 negative samples from every frame, where positive and negative examples have ≥ 0.7 and ≤ 0.5 IoU overlap ratios with ground-truth bounding boxes, respectively.
  • online learning, we collect St+(=50) positive and St-(= 200) negative samples with ≥ 0.7 and ≤ 0.3 IoU overlap ratios with the estimated target bounding boxes, respectively(except that S1+ = 500 and S1-= 5000.
  • bounding-box regression: use 1000 training examples with the same parameters as [13].
  • For offline training of MDNet, we use 58 training sequences collected from VOT2013 [26], VOT2014 [25] and VOT2015 [1], excluding the videos included in OTB100.

Experiment

  • 在OTB50和OTB100上和其他分类器进行比较



    The exceptional scores at mild thresholds means our tracker hardly misses targets while the competitive scores at strict thresholds implies that our algorithm also finds tight bounding boxes to targets.

  • 应对fast motion,background clutter,low resolution,occlusion,illumination variation,plane rotation,out of view, scale variation的效果也很好



In particular, our tracker successfully track targets in low resolution while all the trackers based on low-level features are not successful in the challenge.

  • 与Single-domain net,without bounding box regression,with bounding box regression and hard negative samples进行比较,证明这些component对性能都是由效果的


  • Figure 7 shows a few failure cases of our algorithm; slight target
    appearance change causes a drift problem in Coupon sequence, and dramatic appearance change makes our tracker miss the target completely in Jump sequence.


Furthermore, MDNet works well with imprecise re-initializations as shown in the region noise experiment results, which implies that it can be effectively combined with a re-detection module and achieve long-term tracking.

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 222,183评论 6 516
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 94,850评论 3 399
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 168,766评论 0 361
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 59,854评论 1 299
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 68,871评论 6 398
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 52,457评论 1 311
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,999评论 3 422
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,914评论 0 277
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 46,465评论 1 319
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,543评论 3 342
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,675评论 1 353
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 36,354评论 5 351
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 42,029评论 3 335
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,514评论 0 25
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,616评论 1 274
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 49,091评论 3 378
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,685评论 2 360

推荐阅读更多精彩内容