A generically Contrastive Spatiotemporal Representation Enhancement for 3D skeleton action recognition
Abstract: Highlights•To extract more discriminative representation to distinguish the ambiguous samples, we proposed a generically CSRE framework to decompose the features into spatial- and temporal-specific features and apply them for contrastive learning to explore the latent data distributions explicitly.•The proposed CSRE can be seamlessly incorporated into various previous skeleton encoders, which can be regarded as plug-and-play in the training stage and can be removed at the testing stage.•Extensive experiments show that CSRE achieves significant improvements based on five various state-of-the-art methods (HCN, 2S-AGCN, CTR-GCN BlockGCN and Hyperformer) on three benchmarks.
Loading