Paper | Tracking everything in the wild

1 写在前面

2 introduction

MOT task: estimate the trajectory of objects in a video sequence.

limitation1: common MOT benchmarks [16,32,11] only consider tracking objects from very few pre-defined categories, e.g., pedestrian and car, existing MOT methods do not perform well on a large number of categories.

limitation2:the metrics of MOT can be better refined

Current MOT models and metrics are mainly designed for single-category multiple-object racking. When extending to large-scale multi-category MOT, methods simply detect and classify each object and achieve the association via the same labels. This relies heavily on the classification results.

Thus, when the classification is inaccurate e.g., in large-scale multi-category MOT, existing models and evaluation metrics should be improved.

This paper:
To expand tracking to a more general scenario, we propose that classification should be disentangled from tracking, in both evaluation and model design, for multi-category MOT.

  1. design a new metric, Track Every Thing Accuracy (TETA);
    2)a new model, Track Every Thing tracker (TETer).

exp:
large-scale multi-category tracking datasets, TAO and BDD100K.

3 Tracking-Every-Thing Metric

3.1 Limitations for Large-scale MOT Evaluation

How to handle classification. 1. Simply associating objects via the same label relies on the correct classification results. 2. the most naive solution, ignoring the classification results, leads to the evaluation being dominated by the head classes in the long-tailed distribution dataset.

Incomplete Annotations: the large-scale datasets are not exhaustively annotated, so how can we identify and penalize false positive(FP) predictions?

3.2 Tracking-Every-Thing Accuracy (TETA)

TETA consists of three parts:

  1. a localization score
  2. an association score
  3. a classification score

evaluate the different aspects properly.

To avoid false punishments, we ignore the predictions that are not assigned to any clusters during evaluation.

4 Tracking-Every-Thing Tracker

framework:


4.1 class-agnostic localization

This shows the bottleneck of the detection model lies in the classification

Thus, this paper first performs class-agnostic localization.

4.2 associating everything

  • common clues: location, appearance, and class
  • motion (location) is irregular (x)
  • many objects are not predefined (x)
  • while objects in different classes usually have different appearances (selected as the main cue)

Instead of using class information as "hard" prior, the class information is used in a "soft" way by contrastive learning.

With the CEM learned, association can be done by comparing the similarities

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容