Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

ACL ARR 2026 January Submission526 Authors

23 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: distillation, dense retrieval, representation learning, neural networks
Abstract: We present a knowledge distillation framework for text embedding models. A key distinguishing feature is that our distilled models are compatible with their teacher, enabling flexible asymmetric architectures where documents are encoded with the larger teacher model, while queries use smaller student models. We also show that our models automatically inherit MRL and robustness to output quantization whenever these properties are present in the teacher model, without explicitly training for them. To demonstrate the effectiveness of our framework we publish redact-ir, a 23M parameters information retrieval oriented model that, besides being teacher-compatibile, sets a new state-of-the-art (SOTA) on BEIR, ranking no.1 on the public leaderboard for models of its size. Asymmetric mode further increases its retrieval performance. Our scheme is however not restricted to information retrieval. We demonstrate its wider applicability by synthesizing the multi-task redact-mt model. This also sets a new SOTA, achieving no.1 on the public MTEB v2 (English) leaderboard for models of its size. Our technique is applicable to black-box models, requires no judgments nor hard negatives, and training can be conducted using small batch sizes. Thus, dataset and training infrastructure requirements for our framework are modest. We make our models publicly available under a permissive Apache 2.0 license.
Paper Type: Long
Research Area: Low-resource Methods for NLP
Research Area Keywords: distillation, dense retrieval, representation learning, NLP in resource-constrained settings
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 526
Loading