InstEmb: Instruction-Following Embeddings through Look-Ahead Token Distillation

Tianhao Gao; Jun Fang; Xiaohui Zhang; Zhiyuan Liu; Chao Liu; Pengzhang Liu; Qixia Jiang

InstEmb: Instruction-Following Embeddings through Look-Ahead Token Distillation

Tianhao Gao, Jun Fang, Xiaohui Zhang, Zhiyuan Liu, Chao Liu, Pengzhang Liu, Qixia Jiang

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: instruction following embedding, representation distillation, Contrastive Learning, Representation Learning, large language model

TL;DR: InstEmb enhances instruction following embeddings by jointly learning primary semantics and complementary semantics via distillation between look-ahead tokens and golden output.

Abstract: Recent advances have empowered large language models (LLMs) with remarkable fine-grained instruction-following capabilities in text generation tasks. However, embedding methods typically rely solely on the hidden state of the input's last token, limiting their ability to capture complete semantic signals distributed across the full output tokens. Moreover, existing discrete-to-continuous re-encoding approaches introduce semantic discontinuity. To address these limitations, we propose $\textbf{InstEmb}$, a novel instruction following embedding framework. InstEmb jointly optimizes two key aspects: (1) primary semantic information, achieved by employing contrastive learning focused on the representation of the last input token, and (2) complementary semantic information, captured through representation distillation leveraging learnable look-ahead tokens without introducing additional decoding latency. Additionally, we introduce $\textbf{Dual-Anchor Alignment Pooling (DAAP)}$, explicitly aligned with our dual training objectives. Extensive experiments demonstrate that InstEmb achieves state-of-the-art performance across multiple instruction following benchmarks without benchmark-specific supervised data.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 10683

Loading