Keywords: video anomaly detection, benchmark, dataset, analysis, comparison, discussion
Abstract: Benchmark datasets have fueled advances in video anomaly detection, yet they often embed hidden assumptions that distort both research focus and real-world applicability. Common benchmarks implicitly assume that anomalies are human-centric, visually salient, short-lived, and unambiguous to label, while neglecting object-driven, contextual, long-term, or ethically sensitive events. To expose and systematize these biases, we conduct the first dimensional analysis of anomaly detection benchmarks. Our framework organizes dataset design along four principled axes: content (e.g., taxonomy, motion, modality), annotation (e.g., density, human involvement, consistency), distribution (e.g., frequency, diversity, temporal extent), and societal impact (e.g., privacy, fairness). Applying this framework, we uncover structural imbalances: most benchmarks overrepresent conspicuous human anomalies while underrepresenting subtle or multimodal patterns, along with inconsistent annotation protocols and skewed anomaly distributions that confound fair evaluation. These design choices restrict the diversity of learnable patterns, bias algorithmic search spaces, and limit the operational robustness of deployed systems. We consolidate our findings into actionable guidelines for next-generation benchmarks that broaden anomaly coverage, enable reproducible evaluation, and embed social responsibility into dataset design. By reframing benchmarks through a dimensional lens, this work lays the foundation for more generalizable, equitable, and trustworthy video anomaly detection.
Primary Area: datasets and benchmarks
Submission Number: 1331
Loading