Keywords: zero-shot learning; human-object interaction; visual relationship; object detection;
Abstract: As an interdisciplinary field encompassing Zero-Shot learning, action recognition, and visual
relationship detection, zero-shot Human-Object Interaction (HOI) detection aims to discern the relationship between
individuals and objects within a given scene. This paper provides a comprehensive summary and analysis of research
findings in zero-shot human interaction detection. Firstly, various methods for enhancing the recognition accuracy
of unseen samples are categorized into four groups: those based on semantic attributes, generative models, transfer
learning, and attention mechanisms. Representative approaches within each category are elaborated upon and
thoroughly examined. Subsequently, the applications of zero-shot human interaction detection in dynamic video
recognition and assistant robotics are discussed. Finally, this paper outlines the challenges faced by zero-shot human
interaction detection from three perspectives: semantic gap, long-tail data distribution and diversity issues;
furthermore suggesting that future development directions may involve leveraging techniques such as semantic
graph modeling, data augmentation strategies,and multimodal learning.
Submission Number: 12
Loading