Cross-Level Guided Attention for Human-Object Interaction Detection

Published: 2023, Last Modified: 30 Sept 2024ICME Workshops 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, the transformer-based methods have achieved advanced performance result in human-object interaction (HOI) detection task. However, most of them directly utilize the semantically high-level feature from the deep layer's output in pre-trained backbone to get the final HOI detection results, which we consider may prevent the further performance improvement due to the semantic gap between the upstream pre-train task and HOI detection task. In this work, we design a Cross-Level Guided Attention Network (CLAN) for HOI detection. The proposed method utilizes the information from the pre-training task's semantically high-level feature to generate the attention score towards the low-level and primitive feature to get the key signal for HOI detection task. Experiments shows that CLAN can achieve competitive performance results on both V-COCO and HICO-DET benchmarks.
Loading