【Paper Reading】Novel Human-Object Interaction Detection via Adversarial Domain Generalization

Paper reading

titile

Novel Human-Object Interaction Detection via Adversarial Domain Generalization

author

Yuhang Song, Wenbo Li, Lei Zhang, Jianwei Yang, Emre Kiciman, Hamid Palangi,Jianfeng Gao, C.-C.Jay Kuo, and Pengchuan Zhang

单位

University of Southern California
Samsung Research America AI Center
Microsoft Corporation

数据集

HICO-DET,
但是本文对数据集的划分和之前不一样。
在新的划分中,测试集中的HOI组合都是没有在训练集出现过的。
Unrel

论文地址

https://arxiv.org/pdf/2005.11406.pdf

摘要

We study in this paper the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios. The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations. As a result, most existing HOI methods heavily rely on object priors and can hardly generalize to unseen combinations. To tackle this problem, we propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction. To measure the performance improvement, we create a new split of the HICO-DET dataset, where the HOIs in the test set are all unseen triplet categories in the training set. Our experiments show that the proposed framework significantly increases the performance by up to 50% on the new split of HICO-DET dataset and up to 125% on the UnRel dataset for auxiliary evaluation in detecting novel HOIs.
本文研究了新型人机交互检测问题,旨在提高模型对不可见场景的泛化能力。

贡献

1. Our first contribution is to create a new benchmark dataset for the novel HOI detection task, based on the images and annotations from the HICO-DET dataset [2] and the UnRel dataset [28]. The new benchmark dataset avoids the overlapping of the triplet categories in the training set, validation set and test set. This new benchmark contains an additional evaluation set from UnRel dataset [28], highlighting its instances with unusual scenes.
2. Our second contribution is to propose a unified adversarial domain generalization framework, which can serve as a plug-in module for existing models to improve their generalization ability. We instantiate both conditional and uncon�ditional methods within the framework and build its relationship with previous methods. Experiments on HICO-DET and Unrel dataset show that our proposed adversarial training can get uniformly significant improvement on all metrics. Our work shows promising results of adversarial domain generalization in conquering the combinatorial prediction problem in real-world applications.

framwork

image.png

performance

image.png

image.png

image.png

image.png

image.png

学习体会

1. Over the past few years, rapid progress has been made in visual recognition tasks, but image understanding also calls for visual relationship detection

  1. Predicate classification (PredCls): For each human-predict-object triplet in the test set, predict the predicate class given the ground-truth bounding boxes and object label.
  2. Predicate detection (PredDet): For each image in the test set, detect all human-predict-object triplets given the ground-truth bounding boxes and their associated labels.

理解的词:
PredCls
PredDet
SgDet
KLD

文章公式较多,第一次读没有读明白,还需要第二次读

第二次读

待补充
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容