Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection
Keywords: Video Anomaly Detection, Weakly-supervised Learning
TL;DR: A Novel Approach to Weakly-Supervised Video Anomaly Detection
Abstract: Video Anomaly Detection (VAD) has proved to be a challenging task due to the in-
herent variability of anomalous events and the scarcity of data available. Under the
common Weakly-Supervised VAD (WSVAD) paradigm, only a video-level label
is available during training, while the predictions are carried out at the frame-level.
Despite decent progress on simple anomalous events (such as explosions), more
complex real-world anomalies (such as shoplifting) remain challenging. There
are two main reasons for this: (I) current state-of-the-art models do not address
the diversity between anomalies during training and process diverse categories
of anomalies with a shared model, thereby ignoring the category-specific key at-
tributes; and (II) the lack of precise temporal information (i.e., weak-supervision)
limits the ability to learn how to capture complex abnormal attributes that can
blend with normal events, effectively allowing to use only the most abnormal snip-
pets of an anomaly. We hypothesize that these issues can be addressed by sharing
the task between multiple expert models that would increase the possibility of cor-
rectly encoding the singular characteristics of different anomalies. Furthermore,
multiple Gaussian kernels can guide the experts towards a more comprehensive
and complete representation of anomalous events, ensuring that each expert pre-
cisely distinguishes between normal and abnormal events at the frame-level. To
this end, we introduce Gaussian Splatting-guided Mixture of Experts (GS-MoE),
a novel approach that leverages a set of experts trained with a temporal Gaussian
splatting loss on specific classes of anomalous events and integrates their predic-
tions via a mixture of expert models to capture complex relationships between
different anomalous patterns. The introduction of temporal Gaussian splatting
loss allows the model to leverage temporal consistency in weakly-labeled data,
enabling more robust identification of subtle anomalies over time. The novel loss
function, designed to enhance weak supervision, further improves model perfor-
mance by guiding expert networks to focus on segments of data with a higher like-
lihood of containing anomalies. Experimental results on the UCF-Crime and XD-
Violence datasets demonstrate that our framework achieves SOTA performance,
scoring 91.58% AUC on UCF-Crime.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7035
Loading