Abstract: The role of the sample generation mechanism in contrastive learning is pivotal. It not only determines the pairings of positive and negative samples but also enriches the diversity of the sample pool, thereby substantially affecting the quality of the learned representations. Yet, maintaining semantic consistency within positive sample pairs and amplifying sample diversity remain persistent hurdles. To address these challenges, this paper investigates the potential of synthesizing semantically consistent samples by leveraging multi-source and multi-modal prompts, guided by the capabilities of Large Multimodal Models. Through a concise and elegant design, we construct a framework capable of generating semantic-aware positive sample pairs. Based on this framework, we delve deeper into the crucial role of semantic consistency in representation learning through visualization and ablation experiments. Additionally, we systematically outline the fundamental principles and universal methods for generating synthetic samples in contrastive learning using large model techniques. Extensive experimental results prove the superior performance of our method and help us uncover related patterns. We will make all the code and generated datasets publicly available.
Loading