Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration

Published: 10 Jun 2025, Last Modified: 10 Jun 2025LCFM 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: long context understanding, pause token, lost-in-the-middle
TL;DR: We propose pause-tuning, a lightweight yet effective technique that redistributes attention to address the lost-in-the-middle problem, enhancing LLMs' comprehension of long-context inputs.
Abstract: LLMs have demonstrated remarkable proficiency in understanding tasks but continue to struggle with long-context comprehension, particularly with content located in the middle of extensive inputs. This limitation, known as the Lost-in-the-Middle (LITM) problem, hinders models from fully processing and utilizing information across lengthy contexts. To address this issue, we introduce pause-tuning, a technique that redistributes attention to enhance comprehension of long-context inputs. Our approach fine-tunes language models on datasets with inserted pause tokens, segmenting inputs into manageable parts. We evaluate pause-tuning against alternative approaches using the Needle-in-a-Haystack (NIAH) and LongBench v2 benchmarks, in which models must retrieve specific information and answer challenging multiple-choice questions respectively based on long contexts. Experimental results demonstrate significant performance gains, suggesting that pause-tuning successfully enhances attention redistribution and improves long-context retention. We also observe significant changes in the attention distribution when pause-tuning is applied. The code and data are available at https://anonymous.4open.science/r/LITM-PauseTokens-7357.
Submission Number: 3
Loading