Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition

Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition

ICLR 2026 Conference Submission16300 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Backward Attention, Repetition, Zero-shot, Word Sense Disambiguation, Embedding

TL;DR: We propose ReBA, a repetition and backward attention method that enhances embeddings of decoder-only LLMs, significantly improving zero-shot word sense disambiguation and sentence understanding without extra training.

Abstract: The quality of embeddings produced by pretrained language models is critical to downstream performance. Prior work has shown that repeating input text can improve sentence-level representations, but such repetition may adversely affect word-level embeddings. To address this, we propose ReBA (Retrieval Backward Attention), a method combining input repetition with a novel backward attention mechanism that enables tokens to incorporate future context. Across multiple zero-shot tasks, ReBA significantly improves word-level embeddings while preserving sentence-level gains, offering new insights into enhancing representation quality in decoder-only language models.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16300

Loading