Enhancing Few-Shot Video Anomaly Detection with Key-Frame Selection and Relational Cross Transformers

Ahmed Fakhry; Jong Taek Lee

Enhancing Few-Shot Video Anomaly Detection with Key-Frame Selection and Relational Cross Transformers

Ahmed Fakhry, Jong Taek Lee

Published: 01 Jan 2024, Last Modified: 05 Apr 2025AVSS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Detecting illegal activities using video anomaly detection is an enormous challenge in security and surveillance. The lack of labeled instances for anomalous actions poses a significant obstacle to existing learning techniques, and determining the optimal data representation that captures the essential features and patterns vital for detecting anomalies proves to be exceedingly difficult. We have developed a few-shot video anomaly detection method, FewVAD, which employs a key-frame selection module and spatial-temporal relational modeling to extract pertinent features and reduce temporal redundancy from lengthy surveillance recordings. We have evaluated our method on two popular surveillance datasets, UCF-Crime and XD-Violence, and compared its performance against established few-shot models and other unsupervised and weakly supervised learning video anomaly detection models. Our model has attained an accuracy of 41.7% and 54.3% for 5-way 5-shot few-shot configuration on the UCF-Crime and XD-Violence datasets, respectively. Furthermore, it has obtained an AUC score of 86.60% for the 2-way anomaly detection task on the UCFCrime dataset. FewVAD achieves a milestone in few-shot video anomaly detection, competing strongly with current weakly-supervised and unsupervised VAD methods.

Loading