Attack-Augmented Mixing-Contrastive Skeletal Representation Learning

Binqian Xu, Xiangbo Shu, Jiachao Zhang, Rui Yan, Guo-Sen Xie

Published: 01 Jan 2026, Last Modified: 31 Mar 2026IEEE Transactions on Image ProcessingEveryoneRevisionsCC BY-SA 4.0

Abstract: Contrastive learning facilitates the acquisition of informative skeleton representations for unsupervised action recognition by leveraging effective positive and negative sample pairs. However, most existing methods construct these pairs through weak or strong data augmentations, which typically rely on random appearance alterations of skeletons. While such augmentations are somewhat effective, they introduce semantic variations only indirectly and face two inherent limitations. First, simply modifying the appearance of skeletons often fails to reflect meaningful semantic variations. Second, random perturbations can unintentionally blur the boundary between positive and negative pairs, weakening the contrastive objective. To address these challenges, we propose an attack-driven augmentation framework that explicitly introduces semantic-level perturbations. This approach facilitates the generation of hard positives while guiding the model to mine more informative hard negatives. Building on this idea, we present Attack-Augmented Mixing-Contrastive Skeletal Representation Learning (A2MC), a novel framework that focuses on contrasting hard positive and hard negative samples for more robust representation learning. Within A2MC, we design an Attack-Augmentation (Att-Aug) module that integrates both targeted (attack-based) and untargeted (augmentation-based) perturbations to generate informative hard positive samples. In parallel, we propose the Positive-Negative Mixer (PNM), which blends hard positive and negative features to synthesize challenging hard negatives. These are then used to update a mixed memory bank for more effective contrastive learning. Comprehensive evaluations across three public benchmarks demonstrate that our approach, termed A2MC, achieves performance on par with or exceeding existing state-of-the-art methods.

External IDs:doi:10.1109/tip.2026.3659331