Com-STAL: Compositional Spatio-Temporal Action LocalizationDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 19 Mar 2024IEEE Trans. Circuits Syst. Video Technol. 2023Readers: Everyone
Abstract: Spatio-temporal action localization aims to locate the spatial and temporal positions of actors and classify their actions. However, prior research overlooks the fact that human actions often interact with novel objects in real-world scenarios, which neglects the various combinations of action-object, and considerably limits the generalization of the developed models. In this paper, we study the action-object combinations by researching multi-modal vision information of them. To this end, we propose a novel compositional spatio-temporal action localization (Com-STAL) task, which features non-overlapping action-object combinations in their training and test sets. Based on this, we construct a compositional action localization dataset (Com-AD). Beyond that, we propose a simple yet effective framework, Instance-Centric Interaction Network (ICIN), to reduce invalid induction biases within the visual modality and alleviate the combined distribution bias issue by leveraging additional modal information. The extensive experiment results on Com-AD demonstrate superior action localization performance of ICIN.
0 Replies

Loading