Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video Anomaly Detection, Weakly-supervised Learning
TL;DR: A Novel Approach to Weakly-Supervised Video Anomaly Detection
Abstract: Video Anomaly Detection (VAD) has proved to be a challenging task due to the in- herent variability of anomalous events and the scarcity of data available. Under the common Weakly-Supervised VAD (WSVAD) paradigm, only a video-level label is available during training, while the predictions are carried out at the frame-level. Despite decent progress on simple anomalous events (such as explosions), more complex real-world anomalies (such as shoplifting) remain challenging. There are two main reasons for this: (I) current state-of-the-art models do not address the diversity between anomalies during training and process diverse categories of anomalies with a shared model, thereby ignoring the category-specific key at- tributes; and (II) the lack of precise temporal information (i.e., weak-supervision) limits the ability to learn how to capture complex abnormal attributes that can blend with normal events, effectively allowing to use only the most abnormal snip- pets of an anomaly. We hypothesize that these issues can be addressed by sharing the task between multiple expert models that would increase the possibility of cor- rectly encoding the singular characteristics of different anomalies. Furthermore, multiple Gaussian kernels can guide the experts towards a more comprehensive and complete representation of anomalous events, ensuring that each expert pre- cisely distinguishes between normal and abnormal events at the frame-level. To this end, we introduce Gaussian Splatting-guided Mixture of Experts (GS-MoE), a novel approach that leverages a set of experts trained with a temporal Gaussian splatting loss on specific classes of anomalous events and integrates their predic- tions via a mixture of expert models to capture complex relationships between different anomalous patterns. The introduction of temporal Gaussian splatting loss allows the model to leverage temporal consistency in weakly-labeled data, enabling more robust identification of subtle anomalies over time. The novel loss function, designed to enhance weak supervision, further improves model perfor- mance by guiding expert networks to focus on segments of data with a higher like- lihood of containing anomalies. Experimental results on the UCF-Crime and XD- Violence datasets demonstrate that our framework achieves SOTA performance, scoring 91.58% AUC on UCF-Crime.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7035
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview