Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition
Keywords: Backward Attention, Repetition, Zero-shot, Word Sense Disambiguation, Embedding
TL;DR: We propose ReBA, a repetition and backward attention method that enhances embeddings of decoder-only LLMs, significantly improving zero-shot word sense disambiguation and sentence understanding without extra training.
Abstract: The quality of embeddings produced by pretrained language models is critical to downstream performance. Prior work has shown that repeating input text can improve sentence-level representations, but such repetition may adversely affect word-level embeddings. To address this, we propose ReBA (Retrieval Backward Attention), a method combining input repetition with a novel backward attention mechanism that enables tokens to incorporate future context. Across multiple zero-shot tasks, ReBA significantly improves word-level embeddings while preserving sentence-level gains, offering new insights into enhancing representation quality in decoder-only language models.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16300
Loading