CUVideo(T-CNN)
1.Use the Kalman Filter to smooth the bounding box locations.
卡尔曼滤波是一种高效率的递归滤波器(自回归滤波器)。卡尔曼滤波的一个典型实例是从一组有限的,包含噪声的,通过对物体位置的观察序列(可能有偏差)预测出物体的位置的坐标及速度。在很多工程应用(如雷达、电脑视觉)中都可以找到它的身影。同时,卡尔曼滤波也是控制理论以及控制系统工程中的一个重要课题。
2.主要框架
(1)Still-image Dectection:
同时使用了两个网络进行detection——DeepID-Net [8] is an extension of R-CNN [3] and CRAFT is an extension of Faster R-CNN
For the provided data track, we pretrained VGG [2] and GoogLeNet [1] with batch normalization (BN)
并且做实验得到finetune时2 : 1DET to VID data ratio achieves the best performance。
(2)Multi-context suppression (MCS)
some false positive detections have very large detection scores. Only using the context information within these frames cannot distinguish them from the positive samples. However, considering the detection results on other frames, we can easily determine that the majority of high-confidence detections are other classes and these positive detections are outliers.利用其他的帧来修正依据本帧上下文得到的错误detection
使用图像检测算法将视频帧当做独立的图像来处理并没有充分利用整个视频的上下文信息。虽然说视频中可能出现任意类别的目标,但对于单个视频片段,只会出现比较少的几个类别,而且这几个类别之间有共现关系(出现船只的视频段中可能会有鲸鱼,但基本不可能出现斑马)。所以,可以借助整个视频段上的检测结果进行统计分析:对所有检测窗口按得分排序,选出得分较高的类别,剩余那些得分较低的类别很可能是误检,需对其得分进行压制(如图2)。经过MCS处理后的检测结果中正确的类别靠前,错误的类别靠后,从而提升目标检测的精度。
The classes of detection scores beyond a threshold are regarded as high-confidence classes and the rest as low-confidence classes.利用detection 分数来划分置信度
motion-guided propagation (MGP)——采用光流
1) There are no region proposals covering enough areas of the objects; 有的地方识别不出来
2) Due to bad pose or motion blur of an object 抖动、模糊
(3)Tubelet re-scoring
MGP只能解决短时期的传播,我们还需要更长时期的工作:tracking algorithms
High-confidence tracking:
挑选置信度最高的box作为anchors,然后当这个anchor对应的跟踪的失误达到一个阈值,就重新选择anchor。并且,由于高质量的tracking会导致box在空间和时间上的大量堆积,所以采用一个类似NMS的机制~(track box与box的重叠)
Spatial max-pooling:
上面所说的tracking还不够准确,The spatial max-pooling process is to replace tubelet box proposals with detections of higher confidence by the still-image object detector。
我们首先从静止图像对象检测器获得超过阈值的重叠的检测,这里的overlap是指detection和track box的重叠,The higher the overlap threshold, the more confidence on the tubelet box
Tubelet classification and rescoring:
The main idea of temporal rescoring is to classify tubelets into positive and negative samples and map the detection scores into different ranges to increase the score margins。将管道分成positive/negtive样本,并将检测分数映射到不同的范围以增加分数边界。
实验证明top-k (i.e. the kth largest detection score from a tubelet)方法最好
After classification, the detection scores of positive samples are min-max mapped to [0:5;1], while negatives to[0;0:5]. Thus, the tubelet detection scores are globally changed so that the margins between positive and negative tubelets are increased。将正负样本分别限制在0.5-1,0-0.5