Abstract: Weakly-supervised Video Anomaly Detection (W-VAD) aims to detect abnormal events in videos given only video-level labels for training. Recent methods relying on multiple instance learning (MIL) and self-training achieve good performance, but they tend to focus on learning easy abnormal patterns while ignoring hard ones, e.g., unusual driving trajectory or over-speeding driving. How to detect hard anomalies is a critical but largely ignored problem in W-VAD. To tackle this challenge, we propose a novel framework, termed Abnormal Ratios guided Multi-phase Self-training (ARMS), for W-VAD. It includes a new abnormal ratio-based MIL (AR-MIL) loss and a new multi-phase self-training paradigm. The AR-MIL loss guides the learning of hard anomalies by enforcing a minimum ratio of abnormal snippets in an abnormal video and no abnormal snippets in a normal video. Our multi-phase self-training paradigm sequentially performs bootstrapping, hard anomalies mining, and adaptive self-training so as to address pseudo labeling on easy anomalies, detect hard anomalies, and setting adaptive abnormal ratios for different videos in a unified framework. Experimental results on three benchmark datasets, i.e., ShanghaiTech, UCF-Crime, and XD-Violence, show that ARMS outperforms all previous state-of-the-art methods and has a great advantage in detecting hard anomalies.
Loading