Spatial-Temporal Asynchronous Normalization for Unsupervised 3D Action Representation Learning

Mengyuan Liu, Youneng Bao, Yongsheng Liang, Fanyang Meng

2022 (modified: 05 Nov 2022)IEEE Signal Process. Lett. 2022Readers: Everyone

Abstract: Unsupervised 3D action representation learning from skeleton sequences has attracted increasing attention in recent years. Existing methods have successfully applied autoencoder network to learn 3D action representation by reconstructing original skeleton sequence. However, these methods ignore motion cues thus suffer from distinguishing actions especially with similar shape information and slightly different motion information. Instead of reconstructing original skeleton sequence, we learn distinctive 3D action representation with autoencoder network by reconstructing normalized motion sequence extracted from original input. To obtain the normalized motion sequence, we specifically design a novel spatial-temporal asynchronous normalization (STAN) method, which normalizes original skeleton sequence in two steps. First, STAN reduces redundant temporal information and extracts motion sequence by subtracting mean value along the temporal dimension. Second, STAN further normalizes the motion sequence along the spatial dimension and generates normalized motion sequence that suffers less from the effect of different human body shapes. Extensive experiments on large scale NTU RGB+D 60 and NTU RGB+D 120 datasets verify the effectiveness of our proposed STAN method, which achieves comparative results with state-of-the-art methods, and also outperforms alternative normalization methods.

0 Replies