Video Anomaly Detection via Progressive Learning of Multiple Proxy Tasks

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Learning multiple proxy tasks is a popular training strategy in semi-supervised video anomaly detection. However, the traditional method of learning multiple proxy tasks simultaneously is prone to suboptimal solutions, and simply executing multiple proxy tasks sequentially cannot ensure continuous performance improvement. In this paper, we thoroughly investigate the impact of task composition and training order on performance enhancement. We find that ensuring continuous performance improvement in multi-task learning requires different but continuous optimization objectives in different training phases. To this end, a training strategy based on progressive learning is proposed to enhance the efficacy of multi-task learning in VAD. The learning objectives of the model in previous phases contribute to the training in subsequent phases. Specifically, we decompose video anomaly detection into three phases: perception, comprehension, and inference, continuously refining the learning objectives to enhance model performance. In the three phases, we perform the visual task, the semantic task and the open-set task in turn to train the model. The model learns different levels of features and focuses on different types of anomalies in different phases. Additionally, we design simple yet effective semantic leveraging the semantic consistency of context. Extensive experiments demonstrate the effectiveness of our method, highlighting that the benefits derived from the progressive learning transcend specific proxy tasks.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Video anomaly detection is an essential task in multimedia interpretation. Learning multiple proxy tasks to train the model is a popular approach in semi-supervised video anomaly detection. These proxy tasks are usually designed for different modal inputs or different kinds of anomalies. Learning multiple proxy tasks simultaneously may cause the model to converge to a sub-optimal point and perform even inferior to training with a single proxy task. Our study extensively investigates the impact of task combination and training sequence on multi-task learning through numerous experiments. This work is dedicated to the continuous improvement of the model's performance as it learns more proxy tasks and more modal information. We introduce a progressive learning approach wherein multiple proxy tasks are learned in different phases, enabling the learning objectives of earlier phases to aid in training subsequent ones. Specifically, we disentangle video anomaly detection into three stages: perception, comprehension, and prediction. The model learns different levels of features in different phases and focuses on different classes of anomalies.
Supplementary Material: zip
Submission Number: 693
Loading