Saliency Prediction of Traffic Surveillance Videos: A Benchmark and A Multi-Task Approach

Published: 01 Jan 2024, Last Modified: 02 Aug 2025WCSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traffic surveillance videos are crucial for the devel-opment of intelligent transportation systems. The huge number of these videos has introduced significant challenges for storage and transmission, etc. Therefore, efficient and accurate video saliency prediction (VSP) can benefit a wide range of video processing techniques for traffic scenes, such as video compression, smart navigation, and traffic event detection. However, currently there are no VSP approaches or eye-tracking datasets dedicated to traffic surveillance videos. In this paper, we establish a large-scale eye-tracking dataset, dubbed traffic surveillance videos 1K (TSV1K). TSV1K contains 1000 high-quality traffic surveil-lance videos, with eye-tracking annotations from 30 subjects. Based on our dataset, we conduct thorough analysis on the correlations between human attention and traffic scenes, e.g., vehicle distribution and scene complexity. Accordingly, we pro-pose a multi-task traffic saliency prediction network (MTTS-Net), which leverages the task of traffic salient object detection (TSOD) to promote the performance of the VSP task. In order to better learn these tasks, a two-stage training strategy is developed to progressively train the MTTS- Net. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art approaches in both tasks of TSOD and VSP on traffic surveillance videos. Our dataset and code are available on https://github.com/giteec/TSV1K.
Loading