Sparse Reasoning is Enough: Biological-Inspired Framework for Abnormal Event Detection with Large Pre-trained Models

Sparse Reasoning is Enough: Biological-Inspired Framework for Abnormal Event Detection with Large Pre-trained Models

ICLR 2026 Conference Submission17417 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Abnormal Event Detection, anomaly detection

Abstract: Abnormal Event Detection (AED) plays a crucial role in real-world applications, including security surveillance, autonomous driving, and industrial monitoring. Recent advances in large pre-trained models have opened new opportunities for training-free AED by leveraging rich prior knowledge and reasoning capabilities learned during pretraining. However, current studies typically rely on dense frame-level inference to ensure abnormal event coverage, incurring high computational costs and latency. This raises a fundamental question: Is this dense reasoning truly necessary when deploying large pre-trained models in AED? To answer this, we propose **ReCoAED**, a new framework inspired by the human nervous system's reflex arc-conscious reasoning stream, enabling adaptive frame processing to reduce redundant computation. It consists of two core streams: i) **Re**flex reacting stream: a lightweight CLIP compares frame features with prototype prompts to form decision vectors, which queries a dynamic memory of prior cases, enabling the system to rapidly determine whether to respond immediately with the memory or escalate the frame for deeper reasoning. ii) **Co**nscious reasoning stream: a medium-scale(7B) vision-language model analyzes novel frames, generating its event descriptions and anomalous scores to continuously update the dynamic memory. Periodically, an LLM reviews accumulated descriptions to identify new abnormal events, refine prototypes, and correct errors to realize self-evolution. Our extensive experiments show that ReCoAED reaches state-of-the-art training-free performance in UCF-Crime/XD-Violence datasets while reasoning on only **28.55%**/**16.04%** of frames used by the previous methods, showing that sparse reasoning is enough for effective large-model-based AED.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 17417

Loading