Abstract: Highlights•MSAST: a novel Transformer model for skeleton-based action recognition.•ASPEM decouples position encoding to capture sample-specific latent dependencies.•MSEM generates multi-scale tokens for multi-scale feature extraction.•ARLM learns unique location information for various samples.•State-of-the-art results on NTU-60 dataset.
Loading