LSTM-based multi-label video event detection

An-An Liu, Zhuang Shao, Yongkang Wong, Junnan Li, Yuting Su, Mohan S. Kankanhalli

2019 (modified: 30 Mar 2022)Multim. Tools Appl. 2019Readers: Everyone

Abstract: Since large-scale surveillance videos always contain complex visual events, how to generate video descriptions effectively and efficiently without human supervision has become mandatory. To address this problem, we propose a novel architecture for jointly recognizing multiple events in a given surveillance video, motivated by the sequence to sequence network. The proposed architecture can predict what happens in a video directly without the preprocessing of object detection and tracking. We evaluate several variants of the proposed architecture with different visual features on a novel dataset perpared by our group. Moreover, we compute a wide range of quantitative metrics to evaluate this architecture. We further compare it to the popular Support Vector Machine-based visual event detection method. The comparison results suggest that the proposal method can outperform the traditional computer vision pipelines for visual event detection.

0 Replies