Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

Hyogun Lee; Haksub Kim; Ig-Jae Kim; Yonghun Choi

Flashback: Memory-Driven Zero-shot, Real-time Video Anomaly Detection

Hyogun Lee, Haksub Kim, Ig-Jae Kim, Yonghun Choi

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: video anomaly detection, multi-modal large language models, zero-shot, real-time

TL;DR: Flashback is a zero-shot, real-time, and explainable VAD system that retrieves from an offline caption memory with lightweight bias controls and runtime encoder selection.

Abstract: Video anomaly detection (VAD) aims to identify unusual events in continuous video streams, yet most existing systems either rely on domain-specific retraining or fail to meet strict real-time demands. We present **Flashback**, a zero-shot and real-time paradigm that reframes VAD as retrieval over an offline pseudo-scene memory. Inspired by how humans recall past experiences to judge the present, Flashback constructs a large set of normal and anomalous captions entirely offline with a language model, embeds them once with a frozen video-text encoder, and reuses this memory online. At inference, each segment is matched against the memory to produce both an anomaly score and a textual rationale, eliminating all online LLM calls and sustaining per-segment deadlines. Three lightweight controls improve robustness: _repulsive prompting_ separates normal and anomalous caption spaces, _scaled anomaly penalization_ corrects residual anomaly bias, and _certainty-driven runtime encoder selection_ maintains weakly-hard real-time guarantees by allocating extra compute only to difficult segments. On UCF-Crime and XD-Violence, Flashback achieves 87.7 AUC and 75.0 AP, outperforming prior zero-shot methods while providing human-readable explanations at up to 43.8 fps on a single consumer GPU. The result is the first VAD system that is simultaneously zero-shot, real-time, and explainable.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 10801

Loading