Dynamic Self-attention Gated Spatial-Temporal Graph Convolutional Network for Skeleton-Based Human Activity Recognition

Yi Xia, Sira Yongchareon, Raymond Lutui, Quan Z. Sheng

Published: 2024, Last Modified: 21 Jan 2026ICONIP (5) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human Activity Recognition (HAR) involves the automatic detection and classification of human actions from sensor data, enabling numerous applications in fields such as healthcare, security, and human-computer interaction. Skeleton-based HAR has gained significant attention due to its robustness against variations in appearance and environmental conditions, leveraging the structural information of human poses to recognize and classify activities accurately. Despite significant advancements, current deep learning models often use fixed topological structures and cannot directly construct long-range features or distinguish subtle motions. To address these challenges, we propose a Dynamic Self-Attention Gated Spatial-Temporal Graph Convolutional Network (DSAT-GCN) that integrates graph convolutional networks (GCNs) with advanced attention mechanisms. The DSAT-GCN consists of a Gated Self-Attention module that enhances the capture of long-range dependencies, a Dynamic Feature Fusion module that adaptively integrates features across multiple scales, and a Spatial Graph Convolution module and a Temporal Convolution module that effectively model spatial relationships and temporal dynamics. The results show that our model significantly outperforms existing methods, achieving an accuracy of 92.6% on NTU RGB + D 60 (X-Sub) and 96.9% (X-View), 89.2% on NTU RGB + D 120 (X-Sub) and 90.4% (X-View), as well as 89% (Top-1) and 94.5% (Top-5) on Kinetics Skeleton. Additionally, ablation studies confirm the critical contributions of each module to the overall performance of our model.

External IDs:dblp:conf/iconip/XiaYLS24