Enhancing Generalization in Video Anomaly Detection through Multimodal Data Mixing

Published: 2025, Last Modified: 09 Nov 2025IPDPS (Workshops) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Video anomaly detection (VAD) plays a critical role in identifying rare and unusual events in video streams, with applications ranging from surveillance to industrial monitoring. However, the generalization of VAD models to diverse datasets and anomaly types remains a significant challenge due to the limited amount of training data. In this work, we propose novel generalization techniques for the state-of-the-art transformer-based model, AnomalyClip. Our approach leverages multimodal data mixing, combining external datasets with textual descriptions to generate pseudo-anomaly samples through Adaptive Instance Normalization and Gaussian blending. These methods enhance feature representations, enabling the model to better generalize to unseen scenarios. Experimental evaluations on benchmarks such as ShanghaiTech, UCF-Crime, and XD-Violence demonstrate the efficacy of our techniques, achieving significant improvements in area under the curve metrics. This work highlights the potential of training-focused strategies to improve the robustness and scalability of VAD systems in high-performance computing contexts.
Loading