Delving Into Instance Modeling for Weakly Supervised Video Anomaly Detection

Shengyang Sun, Jiashen Hua, Junyi Feng, Dongxu Wei, Baisheng Lai, Xiaojin Gong

Published: 01 Aug 2025, Last Modified: 10 Nov 2025IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0

Abstract: Weakly-supervised video anomaly detection (WS-VAD) aims to identify fine-grained anomalies from sparse video-level labels, which has gained increasing attention in recent years due to its various applications such as disaster warning and public security. Recent studies typically formulate WS-VAD as a multi-instance learning (MIL) problem. However, they neglect the instance creation process and simply apply a uniform temporal pooling (UTP) operation to obtain the training instances, leading to severe anomaly contamination and dilution. In this paper, we emphasize the importance of the instance modeling procedure and propose two simple yet effective modules, i.e., the dynamic segment merging (DSM) module and the retrieval-augmented anomaly restoration (RA2R) module, to tackle the problem from segment-level and feature-level, respectively. We equip various state-of-the-art WS-VAD models with the proposed methods and conduct thorough experiments on the challenging datasets, e.g., UCF-Crime, and XD-Violence. Results demonstrate the proposed method brings consistent performance improvement and establishes new state-of-the-art.

External IDs:doi:10.1109/tcsvt.2025.3546766