Reflection-Window Decoding: Text Generation with Selective Refinement

Zeyu Tang; Zhenhao Chen; Xiangchen Song; Loka Li; Yunlong Deng; Yifan Shen; Guangyi Chen; Peter Spirtes; Kun Zhang

Reflection-Window Decoding: Text Generation with Selective Refinement

Zeyu Tang, Zhenhao Chen, Xiangchen Song, Loka Li, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose reflection-window decoding for text generation to address the inherent shortcoming of the purely autoregressive decoding approach.

Abstract: The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that refinement and generation can be carried out interchangeably as the decoding proceeds. Our selective refinement framework strikes a balance between efficiency and optimality, and our extensive experimental results demonstrate the effectiveness of our approach.

Lay Summary: Current language models generate text one word at a time based on history (known as the "autoregressive" way), without a built-in ability to go back and fix previous mistakes. This is like writing an essay where you can never use the backspace key--once a word is written, it stays forever. Our research shows that this approach leads to suboptimal text that could have been better, if the model could revise its work as the text generation unfolds. We developed "Reflection-Window Decoding," which gives language models the ability to pause and revise recent text before continuing. Using mathematical analysis, we find that the uncertainty about what to say next can serve as a signal that revision might be needed. Our approach selectively regenerates problematic sections upon reflection, allowing real-time correction without starting over completely. Our work reconsiders how language models should handle self-correction. Rather than relying solely on models' (potentially unreliable) high-level behavior to reflect on and revise complete responses after generation, our approach demonstrates the value of building correction mechanisms directly into the model itself. By enabling real-time refinement as text unfolds, we enable language models to write and revise simultaneously, just as humans do when writing.

Primary Area: Deep Learning->Large Language Models

Keywords: Autogressive Decoding, Text Generation, Reflection Window, Selective Refinement, Large Language Model

Submission Number: 7609

Loading