Graph transformer network with temporal kernel attention for skeleton-based action recognition

Published: 01 Jan 2022, Last Modified: 13 Nov 2024Knowl. Based Syst. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Skeleton-based human action recognition has caused wide concern, as skeleton data can robustly adapt to dynamic circumstances such as camera view changes and background interference thus allowing recognition methods to focus on robust features. In recent studies, the human body is modeled as a topological graph, and the graph convolution network (GCN) is used to extract features of actions. Although GCN has a strong ability to learn spatial modes, it ignores the varying degrees of higher-order dependencies that are captured by message passing. Moreover, the joints represented by vertices are interdependent, and hence incorporating an attention mechanism to weigh dependencies is beneficial. In this work, we propose a kernel attention adaptive graph transformer network (KA-AGTN), which models the higher-order spatial dependencies between joints by the graph transformer operator based on multihead self-attention. In addition, the Temporal Kernel Attention (TKA) block in KA-AGTN generates a channel-level attention score using temporal features, which can enhance temporal motion correlation. After combining the two-stream framework and adaptive graph strategy, KA-AGTN outperforms the baseline 2s-AGCN by 1.9% and by 1% under X-Sub and X-View on the NTU-RGBD 60 dataset, by 3.2% and 3.1% under X-Sub and X-Set on the NTU-RGBD 120 dataset, and by 2% and 2.3% under Top-1 and Top-5 and achieves the state-of-the-art performance on the Kinetics-Skeleton 400 dataset.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview