Dense Dilated Network for Video Action Recognition

Baohan Xu, Hao Ye, Yingbin Zheng, Heng Wang, Tianyu Luwang, Yu-Gang Jiang

Published: 2019, Last Modified: 16 Jun 2023IEEE Trans. Image Process. 2019Readers: Everyone

Abstract: The ability to recognize actions throughout a video is essential for surveillance, self-driving, and many other applications. Although many researchers have investigated deep neural networks to get a better result in video action recognition, these networks usually require a large number of well-labeled data to train. In this paper, we introduce a dense dilated network to collect action information from snippet-level to global-level. The dilated dense network is composed of the blocks with densely connected dilated convolutions layers. Our proposed framework is capable of fusing outputs from each layer to learn high-level representations, and these representations are robust even with only a few training snippets. We study different spatial and temporal modality fusing configurations and introduce a novel temporal guided fusion upon the dense dilated network which can further boost the performance. We conduct extensive experiments on two popular video action datasets: UCF101 and HMDB51. The experiments demonstrate the effectiveness of our proposed framework.

0 Replies