Asynchronous Joint-Based Temporal Pooling for Skeleton-Based Action Recognition

Published: 2025, Last Modified: 13 Mar 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep neural networks for skeleton-based human action recognition (HAR) often utilize traditional averaging or maximum temporal pooling to aggregate features by treating all joints and frames equally. However, this approach can excessively aggregate less discriminative or even indiscriminative features into the final feature vectors for recognition. To address this issue, a novel method called asynchronous joint adaptive temporal pooling (AJTP) is introduced in this paper. The method aims to enhance action recognition by identifying a set of informative joints across the temporal dimension and applying a joint-based and asynchronous motion-preservative pooling rather than conventional frame-based pooling. The effectiveness of the proposed AJTP has been empirically validated by integrating it with popular Graph Convolutional Network (GCN) models on three benchmark datasets: NTU RGB+D 120, PKUMMD, and Kinetic400. The results have shown that a GCN model with AJTP substantially improves performance compared to its counterpart GCN model with conventional temporal pooling techniques. The source code is available at https://github.com/ShanakaRG/AJTP.
Loading