Human-Object-Object Interaction: Towards Human-Centric Complex Interaction DetectionDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 05 Nov 2023ACM Multimedia 2023Readers: Everyone
Abstract: Localizing and recognizing interactive actions in videos is a pivotal yet intricate task that paves the way towards profound video comprehension. Recent advancements in Human-Object Interaction (HOI) detection, which involve detecting and localizing the interactions between human and object pairs, have undeniably marked significant progress. However, the realm of human-object-object interaction, an essential aspect of real-world industrial applications, remains largely uncharted. In this paper, we introduce a novel task referred to as Human-Object-Object Interaction (HOOI) detection and present a cutting-edge method named the Human-Object-Object Interaction Network (H2O-Net). The proposed H2O-Net is comprised of two principal modules: sequential motion feature extraction and HOOI modeling. The former module delves into the gradually evolving visual characteristics of entities throughout the HOOI process, harnessing spatial-temporal features across multiple fine-grained partitions. Conversely, the latter module aspires to encapsulate HOOI actions through intricate interactions between entities. It commences by capturing and amalgamating two sub-interaction features to extract comprehensive HOOI features, subsequently refining them using the interaction cues embedded within the long-term global context. Furthermore, we contribute to the research community by constructing a new video dataset, dubbed the HOOI dataset. The actions encompassed within this dataset pertain to pivotal operational behaviors in industrial manufacturing, imbuing it with substantial application potential and serving as a valuable addition to the existing repertoire of interaction action detection datasets. Experimental evaluations conducted on the proposed HOOI and widely-used AVA datasets demonstrate that our method outperforms existing state-of-the-art techniques by margins of 6.16 mAP and 1.9 mAP, respectively, thus substantiating its effectiveness.
0 Replies

Loading