Which One is Better? Self-supervised Temporal Coherence Learning for Skeleton Based Action RecognitionDownload PDFOpen Website

Published: 2022, Last Modified: 05 Nov 2023IJCB 2022Readers: Everyone
Abstract: Recently, researchers have achieved significant results in the skeleton based action recognition task. To better model the skeleton sequences, existing methods learned the feature representations in the self-supervised setting by solving pretext tasks, such as predicting the order of a shuffled skeleton sequence or verifying whether a given skeleton sequence is shuffled or not. However, these pretext tasks are either too challenging or too easy for the encoder to obtain a proper skeleton representation for action recognition. Therefore, we propose a novel self-pretraining pretext task, Which One Is Better (WOIB), to identify which one is more temporally coherent, given two shuffled skeleton sequences. Experiments on the NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton datasets with different network architectures show significant improvements in recognition accuracy, demonstrating that such a well-designed pretext task is general and able to drive the encoder to learn more discriminative representations.
0 Replies

Loading