Universal embeddings via Reconstruction-Augmented Contrastive Learning

ICLR 2026 Conference Submission17463 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Universal embeddings, semantic richness, autoregressive reconstruction task
TL;DR: A novel unified embedding framework that leverages Reconstruction-augmented Contrastive learning to learn discriminative representations enriched with semantic information.
Abstract: Universal multimodal embedding models play a crucial role in tasks such as multimodal search, recommendation, and retrieval-augmented generation. However, standard contrastive learning frameworks typically optimize embeddings by pulling positives closer and pushing negatives apart, without explicitly enforcing the embeddings to preserve the rich semantics of the inputs. This often yields representations with moderate discriminability but limited semantic information, which results in suboptimal retrieval performance. In this work, we propose ReCo, a novel unified embedding framework that leverages Reconstruction-augmented Contrastive learning to learn discriminative representations enriched with semantic information. By forcing the model to reconstruct the instance semantic content solely from the encoded embeddings, ReCo produces representations that are both semantically richer and more discriminative than those learned with contrastive learning alone, yielding a unified representation space that is well-suited for embedding tasks. Extensive experiments on the MMEB benchmark and multiple cross-modal retrieval tasks demonstrate that ReCo achieves superior performance and outperforms existing state-of-the-art models.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 17463
Loading