PRG-Net: Point Relationship-Guided Network for 3D human action recognition

Published: 27 Jun 2025, Last Modified: 22 May 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Point clouds contain rich spatial information, providing important supplementary clues for human action recognition. Recent methods for action recognition based on point cloud sequences primarily rely on complex spatiotemporal local encoding. However, these methods often utilize max-pooling operations to select features when extracting local features, restricting feature updates to local neighborhoods and failing to fully exploit the relationships between regions. Moreover, cross-frame encoding can also lead to the loss of spatiotemporal information. In this study, we propose PRG-Net, a Point Relation Guided Network, to further improve the learning of spatiotemporal features in point clouds. First, we designed two core modules: the Spatial Feature Aggregation (SFA) and the Spatial Feature Descriptor (SFD) modules. The SFA module expands the spatial structure between regions using dynamic aggregation techniques, while the SFD module guides the region aggregation process by Attention-Weighted Descriptors. They enhance the modeling of human spatial structure by expanding the relationships between regions. Second, we introduce inter-frame motion encoding techniques that can obtain the final spatiotemporal representation of the human body through the aggregation of crossframe vectors, without relying on complex spatiotemporal local encoding. We evaluate PRG-Net on publicly available human action recognition datasets, including NTU RGB+D 60, NTU RGB+D 120, UTD-MHAD, and MSR Action 3D. Experimental results demonstrate that our method outperforms state-of-the-art point-based 3D action recognition methods significantly. Furthermore, we conduct extended experiments on the SHREC 2017 dataset for gesture recognition, and the results show that our method maintains competitive performance on that dataset as well.
Loading