Abstract: Action recognition based on 2D information has encountered intrinsic difficulties such as occlusion and view etc. Especially suffering with complicated changes of perspective. In this paper, we present a straightforward and efficient approach for 3D human action recognition based on skeleton sequences. A rough geometric feature, termed planes of 3D joint motions vector (PoJM3D) is extracted from the raw skeleton data to capture the omnidirectional short-term motion cues. A customized 3D convolutional neural network is employed to learn the global long-term representation of spatial appearance and temporal motion information with a scheme called dynamic temporal sparse sampling (DTSS). Extensive experiments on three public benchmark datasets, including UTD-MVAD, UTD-MHAD, and CAS-YNU-MHAD demonstrate the effectiveness of our method compared to the current state-of-the-art in cross-view evaluation, and significant improvement in cross-subjects evaluation. The code of our proposed approach is available at released on GitHub.
0 Replies
Loading