MSTAgent-VAD: Multi-scale video anomaly detection using time agent mechanism for segments' temporal context mining

Published: 01 Jan 2025, Last Modified: 22 Jul 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Due to the lack of frame-level annotations during training, video anomaly detection (VAD) requires developing learning methods without comprehensive supervision. Previous approaches have focused on modeling temporal relationships and learning discriminative features but often struggle with incomplete anomaly detection and weak segment separation. To address these issues, we propose a multi-scale VAD method using a time agent mechanism, called MSTAgent-VAD, which achieves significant innovation in method structure and feature learning. Firstly, in view of the diversity of anomaly events in videos on temporal scales, we design a multi-scale temporal attention module to capture temporal features of abnormal segments of varying lengths, enhancing temporal consistency and addressing limitations in detecting anomalies of diverse durations. Secondly, by generating temporal agent tokens with deformable convolution, the time agent mechanism can strengthen the distinction and improve separation between normal and abnormal segments in feature space, especially for incomplete anomalies and blurred boundaries, thus enhancing model discrimination. Finally, based on the multi-instance learning (MIL) strategy, an improved robust temporal feature magnitude (RTFM) learning method is used to detect multiple discrete abnormal segments, which solves the challenge that traditional methods are difficult to identify diverse anomalies in complex scenes and ensures the accuracy of detecting multiple types of anomaly events. Experimental results show that our method achieves state-of-the-art detection performance on the UCSD-Ped2, CUHK Avenue, ShanghaiTech and UCF-Crime datasets, accurately identifying diverse anomalies and showing strong generalization. This study provides an innovative VAD solution for surveillance applications, improving detection performance in real-world scenarios.
Loading