Semantic-driven dual consistency learning for weakly supervised video anomaly detection

Published: 01 Jan 2025, Last Modified: 22 Jul 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•In this paper, we propose a weakly supervised paradigm of cross-modal detection and consistency learning, leveraging dual consistency to provide discriminative representations for anomalies at both the semantic-to-target and target-to-snippet levels.•Specifically, we introduce a cross-modal detection network, which detects the targets in each frame according to given semantic rules, to derive semantic-consistent visual embeddings.•To depict the clear boundary between anomalies and normalities, a cross-domain alignment module is proposed to enhance the discriminative representation of abnormal targets by learning the contextual consistency between the target and snippet embeddings.•Our architecture integrates the detection of semantic-consistent targets based on variable semantic rules, ensuring transferable deployment across scenarios and enabling comprehensive identification, localization, and recognition of abnormal events through a “when-where-which” pipeline.
Loading