Revisiting Skeleton-based Action Recognition

Haodong Duan; Yue Zhao; Kai Chen; Dahua Lin; Bo Dai

Revisiting Skeleton-based Action Recognition

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, Bo Dai

29 Sept 2021 (modified: 22 Jun 2025)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: action, skeleton, video, recognition

Abstract: Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt GCNs to extract features on top of human skeletons. Despite the positive results shown in these attempts, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. PoseConv3D relies on a 3D heatmap stack instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseConv3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseConv3D can handle multiple-person scenarios without additional computation cost, and its features can be easily integrated with other modalities at early fusion stages, providing a great design space to boost the performance. PoseConv3D achieves the state-of-the-art on five of six standard skeleton-based action recognition benchmarks. Once fused with other modalities, it achieves the state-of-the-art on all eight multi-modality action recognition benchmarks.

One-sentence Summary: We propose an alternative 3D-CNN-based approach for skeleton-based action recognition, which achieves state-of-the-art performance and has many good characteristics.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/revisiting-skeleton-based-action-recognition/code)

10 Replies

Loading