Spatio-Temporal Motion Topology Aware Graph Convolutional Network for Skeleton-Based Action Recognition

Ji Ma, Wei Liu, Linlin Ding, Hao Luo

Published: 01 Jan 2024, Last Modified: 07 Feb 2025WISA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph Convolutional Networks (GCNs) have gained significant attention and application in skeleton-based action recognition tasks due to their superior ability to process the intrinsic topological information of skeletons. In GCNs-based methods, extracting discriminative features from skeletal topology is crucial for improving recognition accuracy. However, most works alternately extract spatial and temporal features separately, neglecting the complex spatio-temporal parallel features during motion. To address this issue, a novel Spatio-Temporal Motion Topology Aware Graph Convolutional Network (STMTA-GCN) for learning rich spatio-temporal motion information flow is proposed in our work. The core of this network is the Spatio-Temporal Motion Feature Extraction (STMFE) block that globally considers the learning of temporal, spatial, and spatio-temporal information flow between human joints, which includes two modules: Spatio-Temporal Interaction Enhanced Graph Convolution (STIE-GC) and Multi-scale Temporal Information Extraction (MTIE). STIE-GC can directly analyze the interrelations between joints and their spatio-temporal neighbors in the spatio-temporal graph, and capture channel-level spatio-temporal parallel features. MTIE is used to enhance the model’s perception ability of temporal dynamic features. Extensive experiments on three public datasets, NTU RGB+D 60, NTU RGB+D 120, and NW-UCLA, demonstrate that our method achieves better or comparable performance compared to state-of-the-art methods.