Graph-Based Knowledge Driven Approach for Violence Detection

Published: 01 Jan 2025, Last Modified: 25 Jan 2025IEEE Consumer Electron. Mag. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Automatically identifying violence in videos is critical, and combining visual and audio cues is often the most effective approach that provides complementary information for violence detection. However, existing research on fusing these cues is computationally demanding and limited. To address this issue, we propose a novel fused vision-based graph neural network (FV-GNN) for violence detection using audiovisual information. This approach combines local and global features from both audio and video, leveraging a residual learning strategy to extract the most informative cues. Furthermore, FV-GNN utilizes dynamic graph filtering to analyze the inherent relationships between audio and video samples, enhancing violence recognition. The network consists of three branches: integrated, specialized, and scoring. The integrated branch captures long-range dependencies based on similarity, while the specialized branch focuses on local positional relationships. Finally, the scoring branch assesses the predicted violence likelihood against reality. We extensively explored the use of graphs for modeling temporal context in videos and found FV-GNN to be particularly well-suited for real-time violence detection. Our experiments demonstrate that FV-GNN outperforms current state-of-the-art methods on the XD-Violence datasets.
Loading