Semi-supervised Video Anomaly Detection With Compact Deformable 3D Convolution

Shibo Gao, Peipei Yang, Linlin Huang

Published: 01 Jan 2025, Last Modified: 15 Jul 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semi-supervised video anomaly detection (SVAD) is a challenging computer vision task due to the diversity, randomness, and rarity of abnormal events in videos. A series of important SVAD methods follow the frame prediction strategy, where the model is trained only on normal videos and thus frames with higher prediction error are regarded as anomalies. However, these methods often suffer from inadequate sensitivity to anomalies because the prediction error of anomalous frames is insufficiently distinguishable from the error of normal frames. In this paper, we propose a compact deformable 3D convolution (CD3D) for feature extraction in SVAD models, which effectively enhances the discriminating ability between normal and anomalous frames. In CD3D, the offsets of sampling locations are predicted by applying a set of extra separable 3D convolutions to multiple frames, so that the context information is better utilized with reduced computational costs. Our proposed method achieves the best performance among the methods without resorting to additional supervised information. It can also be conveniently applied to methods that utilize extra supervised information, which further enhances their performances.