Local global information aggregation graph convolution for skeleton-based action recognition

Shichong Xie, Shengze Li, Peng Chen, Bing Wang, Jun Zhang

Published: 01 Jan 2025, Last Modified: 20 May 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the field of human action recognition based on skeletal data, how to capture the complex features of skeletal sequences in both spatial and temporal dimensions is a significant challenge. Existing Graph Convolutional Network (GCN) methods typically focus on extracting spatial features using various approaches, while neglecting the temporal characteristics inherent in human actions. In this paper, we propose the L(ocal) G(lobal)-GCN, a novel approach that emphasizes the under-explored temporal dimension by segmenting time to capture both local and global information. We also introduce a relative position embedding method tailored for skeletal data to enhance the representation capability of graph convolution. Additionally, we propose a partitioning method that reduces computational complexity while retaining all information, leveraging the sequential nature of skeletal data. Our LG-GCN achieves and surpasses the state-of-the-art accuracy across three widely employed datasets: NTU RGB+D with 97.3% (X-View), NTU RGB+D 120 with 91.1% (X-Set), and NW-UCLA with 96.6%.