Semantic Data Inflation: Adaptive Augmentation for Contrastive Representation Learning

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: contrastive learning, semantic data augmentation, self-supervised learning, representation learning, multi-scale
TL;DR: A novel semantic-guided data augmentation framework that improves contrastive learning by preserving object identity while maximizing feature diversity.
Abstract: Self-supervised representation learning requires semantically meaningful data augmentations to learn effective features. However, current augmentation strategies either disrupt semantic structures or risk semantic drift. We present Semantic Data Inflation (SDI), a novel framework inspired by the human visual system that leverages explicit semantic guidance from pre-trained models to enhance representation quality. SDI extracts multi-level semantic cues to create consistent augmented views while maintaining critical object identities. Our multi-scale adaptive mechanism dynamically selects optimal semantic extraction strategies based on image characteristics, ensuring robust performance across diverse conditions. Extensive experiments demonstrate that SDI consistently outperforms baseline and generative methods across multiple contrastive learning frameworks. Crucially, we validate the scalability of our approach on ImageNet-1k, demonstrating significant gains over standard baselines. On ImageNette, our approach reaches 95.75\% linear evaluation accuracy, surpassing standard (+3.88\%) and generative (+3.65\%) methods. Further analysis confirms SDI produces more discriminative features with improved semantic consistency. Our code is available at https://anonymous.4open.science/r/Semantic-Data-Inflation-8D7D.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 12000
Loading