Paper | Detecting Twenty-thousand Classes using Image-level Supervision

写在前面

文章出处： ECCV 2022
模型名字： Detic
整体概括：这篇文章跟最开始的OVD-Net一样，都是从pretraining的角度解决open vocabulary的问题，但是这篇文章的思路更加简单粗暴，直接加入imagenet的类别作为训练。本质上不是真正的open vocabulary，但是能够囊括2000类别；

1. Introduction：

OD has two subtasks: 1) finding boxes (localization); 2) naming the boxes (classification)
Previous works couple these two subtasks;
however, the detection benchmarks are much smaller than the classification benchmark；

as in the fig, both the image number and the category number of LVIS (OD) are much smaller than ImageNet (CLS).

image.png

This paper:

propose a detector with image classes (Detic) that uses image-level supervision in addition to detection supervision.

decouple the localization and classification sub-problems;
use image-level labels to train the classifier and broaden the vocabulary of the detector;

illustration:

image.png

standard OD: need gt boxes and labels;

weakly supervised od: assign image-level labels to predicted boxes [error-prone]

this paper: assigns image-level labels to the max-size proposals.

tradional OD: $C_{test} =$ C_{det} $,$ D_{cls} = \phi $

OVD: allows $C_{test} \neq C_{det}$

the whole idea is quite simple.

use both the detection dataset $D_{det}$ and the classifiction dataset $D_{cls}$ to train the detection model.

image.png

sample a batch from both $D_{det}$ and $D_{cls}$ .
if image belongs to $D_{det}$ , then loss = typical od loss, rpn loss + rg loss + cls loss
if image belongs to $D_{cls}$ , then loss = max-size loss, max-size means the proposal has the max size is finally regarded as the region, then used to caculate the cls loss.

image.png

最后编辑于：2023.12.14 08:13:29

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。