Cog-VADU: A Training-Free Cognitive Reasoning Framework for Video Anomaly Detection and Understanding

Cog-VADU: A Training-Free Cognitive Reasoning Framework for Video Anomaly Detection and Understanding

22 Apr 2026 (modified: 26 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Video Anomaly Detection (VAD) aims to temporally localize abnormal events in videos. Most existing approaches rely on dataset-specific training and curated annotations, limiting generalization in open-set scenarios. Recent zero-shot methods based on Large Vision-Language Models (LVLMs) alleviate this dependency but often lack temporal continuity and structured reasoning. We propose \textbf{Cog-VADU}, a fully training-free framework that reformulates VAD as a sequential cognitive reasoning task. Cog-VADU introduces \emph{Chain-of-Anomaly Detection Thought Prompting} (CoADTP), which unrolls an LVLM into a recurrent reasoning chain across video segments. By propagating structured rationales over time, the model maintains implicit temporal memory, enabling robust discrimination between complex anomalies and high-motion normal activities. To improve reliability, we further design a cross-modal re-ranking stage that aligns textual rationales with visual embeddings, enforcing semantic consistency and temporal coherence for refined and stable predictions. Extensive experiments on multiple public VAD benchmarks demonstrate that Cog-VADU achieves competitive zero-shot performance. Moreover, cross-model evaluations show that CoADTP consistently enhances reasoning-based anomaly detection in a model-agnostic manner, providing interpretable and generalizable anomaly understanding for real-world applications.

Submission Type: Long submission (more than 12 pages of main content)

Changes Since Last Submission: N/A

Assigned Action Editor: ~Lu_Jiang1

Submission Number: 8561

Loading