Sparse Temporal Causal Convolution for Efficient Action Modeling

Published: 01 Jan 2019, Last Modified: 14 Nov 2024ACM Multimedia 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, spatio-temporal convolutional networks have achieved prominent performance in action classification. However, debates on the importance of temporal information lead to the rethinking of these architectures. In this work, we propose to employ sparse temporal convolutional operations in networks for efficient action modeling. We demonstrate that the explicit temporal feature interactions can be largely reduced without any degradation. And towards better scalability, we use causal convolutions for temporal feature learning. Under causality constraints, we replenish the model with auxiliary self-supervised tasks, namely video prediction and frame order discrimination. Besides, a gradient based multi-task learning algorithm is introduced for guaranteeing the dominance of action recognition task. The proposed model matches or outperforms the state-of-the-art methods on Kinetics, Something-Something V2, UCF101 and HMDB51 datasets.
Loading