Focus on Subtle Actions: Semantic and Saliency Knowledge Co-Propagation Method for Weakly-Supervised Temporal Action Localization

Published: 01 Jan 2024, Last Modified: 13 Apr 2025PRCV (10) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Weakly-supervised temporal action localization (WTAL) aims to classify and localize actions with only video-level labels. Existing methods use more complex networks and pre-trained backbone to mine visual information, but fail to effectively focus on the subtle-motion subjects in the video, thus generating large errors in the presence of large-scale background disturbance and weak temporal contextual correlation. To enhance the model’s ability to focus on subtle-motion subjects, we propose a Semantic and Saliency Knowledge Co-Propagation (S\(^{2}\)KPro) method that consists of a Class-Aware Branch, a Saliency-Aware Branch, a Knowledge Interaction Module and a Knowledge Internalization Module collaborating with each other to achieve knowledge propagation. Specifically, two branches are introduced to acquire semantic and saliency knowledge respectively and establish an initial connection between two types of knowledge. The Knowledge Interaction Module propagates knowledge corresponding to branches from one branch to the other through co-distillation and achieve the integration of complementary information in both knowledge. The Knowledge Internalization Module mines out key snippets with both critical semantic and saliency information and ambiguous snippets containing unbalanced semantic and saliency content, for which we generates more accurate feature representations through contrastive learning to condense the shared information of the key snippets and alleviate the information imbalance of the ambiguous snippets. Extensive experiments demonstrate that S\(^{2}\)KPro achieves state-of-the-art results on the THUMOS14 and ActivityNet1.2 datasets. The code is available at https://github.com/shouhyshy/S2KPro.
Loading