Conjoined triple deep network for video anomaly detection

Xingya Chang, Yunhe Wu, Shizhuo Deng, Tong Jia, Dongyue Chen

Published: 01 Jan 2024, Last Modified: 13 Nov 2024Multim. Tools Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The video anomaly detection task typically involves identifying anomalous targets, behaviors, and events in surveillance using only normal samples. Most mainstream anomaly detection models train an encoder-decoder network exclusively with normal samples, identifying frames with larger reconstruction errors as anomalies. The challenge with such methods lies in controlling the generalization ability of the reconstruction model on anomaly samples and the bias of reconstruction maps towards small-scale anomalies. To address these issues, we propose a triple-stream framework for anomaly detection, combining cross-prediction agent tasks and multiple local probabilistic models. We incorporate a dual learning mechanism in both the appearance and motion channels, allowing mutual feedback to make the model overfit to normal samples and correspondingly weaken its generalization on anomalous samples. Additionally, we apply the attention mechanism to the network, design a feature consistency function to constrain bias to local features, and construct a probability model for each local region to detect larger-scale anomalies. Finally, we design a fusion scheme to evaluate anomaly scores for video frames. Evaluations on popular benchmark datasets, including UCSD, Avenue, and Street Scene, demonstrate that our proposed model achieves competitive performance compared to state-of-the-art methods.