Unlocking Decoder-LLMs for Text Embedding with Instructions, Soft Supervision and Curriculum Learning

ICLR 2026 Conference Submission11534 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Text Embedding, Large Language Models (LLMs), Curriculum Learning, Contrastive Learning, Knowledge Distillation
Abstract: Large language models (LLMs) are increasingly used for text embedding, yet most decoder-only architectures remain underexplored for this purpose. We present a unified instruction-based framework that adapts decoder-only LLMs into general-purpose text encoders without architectural modifications. Our approach integrates four complementary techniques: (i) in-context learning with structured instructions to generate context-aware embeddings without costly fine-tuning, (ii) soft supervision via knowledge distillation from a high-performance teacher retrieval pipeline, (iii) adaptive margin-based hard-negative mining to stabilize contrastive learning, and (iv) a principled two-stage curriculum learning strategy that first builds a semantic foundation on Semantic Textual Similarity (STS) before specializing on retrieval tasks. Our analysis shows that this sequential curriculum is critical for robust performance, substantially outperforming simultaneous multi-task training. Evaluated on the 41 diverse tasks of the MTEB (English, v2) benchmark, our model achieved the state-of-the-art results, and consistently ranks among the very top models demonstrating both strong overall performance and robustness compared to larger or fully fine-tuned models. Notably, it excels in semantically demanding categories such as Retrieval, Semantic Textual Similarity, and Summarization. These results highlight the effectiveness of strategically combining instruction-based prompting, soft-label distillation, adaptive sampling, and curriculum learning to unlock the potential of decoder-only LLMs as powerful and flexible text embedding models.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 11534
Loading