Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Yi Huang, Shang-Hong Lai, Shao-Heng Tai

2018 (modified: 11 Nov 2022)ECCV Workshops (2) 2018Readers: Everyone

Abstract: To take advantage of recent advances in human pose estimation from images, we develop a deep neural network model for action recognition from videos by computing temporal human pose features with a 3D CNN model. The proposed temporal pose features can provide more discriminative human action information than previous video features, such as appearance and short-term motion. In addition, we propose a novel fusion network that combines temporal pose, spatial and motion feature maps for the classification by bridging the gap between the dimension difference between 3D and 2D CNN feature maps. We show that the proposed action recognition system provides superior accuracy compared to the previous methods through experiments on Sub-JHMDB and PennAction datasets.

0 Replies