Video Anomaly Detection Through Spatial–Temporal Feature Relocalization and Calibrated Trajectory Modeling

Jie Xu, Chenglizhao Chen, Xinyu Liu, Mengke Song, Huaye Zhang

Published: 12 Mar 2026, Last Modified: 22 Mar 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: To address the limitations of existing video anomaly detection methods that overly rely on pixel-space reconstruction and are sensitive to background noise and object scale variations, a self-supervised contrastive learning approach that integrates spatial–temporal feature relocalization with camera-calibrated trajectory modeling is proposed. The proposed method takes spatial–temporal feature relocalization as the core task and constructs a feature-level contrastive learning mechanism to guide the model to focus on discriminative local appearance variations and global temporal semantic evolution. While suppressing background interference and scale-related noise, the method enhances the modeling of fine-grained appearance anomalies and global action-related temporal anomalies. Furthermore, camera calibration is introduced to recover continuous object trajectories in physical space, and a temporal aggregation module is designed to jointly model object motion patterns in pixel space and physical space, thereby improving the model’s ability to perceive complex anomalous behaviors. Experimental results on multiple public video anomaly detection benchmarks demonstrate that the proposed method consistently outperforms existing approaches, validating its effectiveness and generalization capability.