Towards zero-shot human-object interaction detection via vision-language integration

Published: 01 Jan 2025, Last Modified: 17 Jul 2025Neural Networks 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Our KI2HOI effectively utilizes VLM’s visual–linguistic knowledge and achieves superior zero-shot transferability.•We develop visual and linguistic level strategies to fuse spatial information and semantic information.•SOTA results on HICO-DET/V-COCO in zero-shot and supervised settings via extensive experiments.
Loading