Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection
Abstract: Video anomaly detection methods are mainly classified into two categories based on their primary feature types: appearance-based and action-based. Appearance-based methods rely on low-level visual features like color, texture, and shape, learning patterns specific to training scenes. While effective in familiar settings, they struggle with unknown or altered scenes due to poor generalization and limited understanding of action-scene relationships. In contrast, action-based methods focus on detecting action anomalies but often overlook contextual scene associations, leading to misjudgments (e.g., running on a street being deemed normal without considering scene context). To overcome these limitations, we propose a novel decoupling-based anomaly detection architecture (DecoAD). Its core lies in the decoupling and interweaving of scenes and actions, enabling explicit modeling of their complex relationships. By reconstructing these interactions using knowledge graphs, DecoAD achieves a deeper understanding of behaviors and contexts. This design ensures strong performance in both known and unknown scenarios, significantly enhancing generalization. To evaluate its effectiveness in dynamic scenes and its ability to handle scene-related anomalies, we introduce UFSR, the first video anomaly detection dataset featuring dynamic scenes and scene-related anomalies. DecoAD supports fully-supervised, weakly-supervised, and unsupervised settings, improving AUC on UBnormal by 1.1%, 3.1%, and 2.1% in fully-supervised, weakly-supervised, and unsupervised settings, and on UFSR by 1.2% and 8.2% in weakly-supervised and unsupervised settings. The source code and datasets are available at: https://github.com/liuxy3366/DecoAD.
External IDs:dblp:journals/tcsv/ChenLSLYYP25
Loading