A Survey of Zero-Shot HOI Detection

27 Feb 2025 (modified: 01 Mar 2025)XJTU 2025 CSUC SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: zero-shot learning; human-object interaction; visual relationship; object detection;
Abstract: As an interdisciplinary field encompassing Zero-Shot learning, action recognition, and visual relationship detection, zero-shot Human-Object Interaction (HOI) detection aims to discern the relationship between individuals and objects within a given scene. This paper provides a comprehensive summary and analysis of research findings in zero-shot human interaction detection. Firstly, various methods for enhancing the recognition accuracy of unseen samples are categorized into four groups: those based on semantic attributes, generative models, transfer learning, and attention mechanisms. Representative approaches within each category are elaborated upon and thoroughly examined. Subsequently, the applications of zero-shot human interaction detection in dynamic video recognition and assistant robotics are discussed. Finally, this paper outlines the challenges faced by zero-shot human interaction detection from three perspectives: semantic gap, long-tail data distribution and diversity issues; furthermore suggesting that future development directions may involve leveraging techniques such as semantic graph modeling, data augmentation strategies,and multimodal learning.
Submission Number: 12
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview