Uncertainty-Weighted Fusion of RGB and Synthetic Motion Cues for Video Anomaly Detection

ICLR 2026 Conference Submission22540 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Modality Fusion, bayesian inference, Motion-Centric Event, Multimodal
TL;DR: IEF-VAD leverages motion-centric synthetic cues and fuses them with RGB through a principled uncertainty-weighted method, achieving state-of-the-art anomaly detection without event sensors.
Abstract: Most existing video anomaly detectors rely solely on RGB frames, which lack the temporal resolution needed to capture abrupt or transient motion cues—key indicators of anomalous events. To address this, we introduce a robust framework for video anomaly detection that effectively fuses complementary RGB and synthetic motion cues. Our approach, Uncertainty-Weighted Image-Event Fusion (IEF-VAD), addresses the modality imbalance inherent in such data by using a principled, uncertainty-aware process. The system (i) models the high variance and heavy-tailed noise of synthetic cues with a Student's t likelihood; (ii) derives value-level inverse-variance weights via a Laplace approximation to prevent the dominant image modality from suppressing motion-centric signals; and (iii) iteratively refines the fused latent state to remove residual cross-modal noise. This uncertainty-driven fusion consistently outperforms conventional fusion methods like cross-attention and gating, which are prone to modality dominance. Without any dedicated event sensor or frame-level labels, IEF-VAD sets a new state of the art across multiple real-world anomaly detection benchmarks, demonstrating robust performance even under modality-specific degradation. These findings highlight the utility of extracting and integrating these complementary motion cues for accurate and robust video understanding across diverse applications.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 22540
Loading