Paper reading
titile
Novel Human-Object Interaction Detection via Adversarial Domain Generalization
author
Yuhang Song, Wenbo Li, Lei Zhang, Jianwei Yang, Emre Kiciman, Hamid Palangi,Jianfeng Gao, C.-C.Jay Kuo, and Pengchuan Zhang
单位
University of Southern California
Samsung Research America AI Center
Microsoft Corporation
数据集
HICO-DET,
但是本文对数据集的划分和之前不一样。
在新的划分中,测试集中的HOI组合都是没有在训练集出现过的。
Unrel
论文地址
https://arxiv.org/pdf/2005.11406.pdf
摘要
We study in this paper the problem of novel human-object interaction (HOI) detection, aiming at improving the generalization ability of the model to unseen scenarios. The challenge mainly stems from the large compositional space of objects and predicates, which leads to the lack of sufficient training data for all the object-predicate combinations. As a result, most existing HOI methods heavily rely on object priors and can hardly generalize to unseen combinations. To tackle this problem, we propose a unified framework of adversarial domain generalization to learn object-invariant features for predicate prediction. To measure the performance improvement, we create a new split of the HICO-DET dataset, where the HOIs in the test set are all unseen triplet categories in the training set. Our experiments show that the proposed framework significantly increases the performance by up to 50% on the new split of HICO-DET dataset and up to 125% on the UnRel dataset for auxiliary evaluation in detecting novel HOIs.
本文研究了新型人机交互检测问题,旨在提高模型对不可见场景的泛化能力。
贡献
1. Our first contribution is to create a new benchmark dataset for the novel HOI detection task, based on the images and annotations from the HICO-DET dataset [2] and the UnRel dataset [28]. The new benchmark dataset avoids the overlapping of the triplet categories in the training set, validation set and test set. This new benchmark contains an additional evaluation set from UnRel dataset [28], highlighting its instances with unusual scenes.
2. Our second contribution is to propose a unified adversarial domain generalization framework, which can serve as a plug-in module for existing models to improve their generalization ability. We instantiate both conditional and uncon�ditional methods within the framework and build its relationship with previous methods. Experiments on HICO-DET and Unrel dataset show that our proposed adversarial training can get uniformly significant improvement on all metrics. Our work shows promising results of adversarial domain generalization in conquering the combinatorial prediction problem in real-world applications.
framwork
performance
学习体会
1. Over the past few years, rapid progress has been made in visual recognition tasks, but image understanding also calls for visual relationship detection
- Predicate classification (PredCls): For each human-predict-object triplet in the test set, predict the predicate class given the ground-truth bounding boxes and object label.
- Predicate detection (PredDet): For each image in the test set, detect all human-predict-object triplets given the ground-truth bounding boxes and their associated labels.
理解的词:
PredCls
PredDet
SgDet
KLD
文章公式较多,第一次读没有读明白,还需要第二次读
第二次读
待补充