Keywords: ext Generation, Masked Diffusion Models, Block Diffusion, Process Reward Models, PRM, Iterative Refinement, Error Correction, Inference-Time Guidance, Self-Correction, Mathematical Reasoning, Computational Efficiency, Windowed Evaluation, LLaDA, Qwen2.5-Math-PRM
TL;DR: R3 improves text generation from masked diffusion models by using a PRM to identify and refine errors in blocks at inference time, boosting math problem-solving accuracy efficiently.
Abstract: A key challenge for iterative text generation is enabling models to efficiently identify and correct their own errors. We propose Review, Remask, Refine (R3), a relatively simple yet elegant framework that requires no additional model training and can be applied to any pre-trained masked text diffusion model (e.g., LLaDA or BD3-LM). In R3, a Process Reward Model (PRM) is utilized for the $\textbf{Review}$ of intermediate generated blocks. The framework then translates these PRM scores into a $\textbf{Remask}$ strategy: the lower a block's PRM score, indicating potential mistakes, the greater the proportion of tokens within that block are remasked. Finally, the model is compelled to $\textbf{Refine}$ these targeted segments, focusing its efforts more intensively on specific sub-optimal parts of past generations, leading to improved final output.
Code: ipynb
Submission Number: 93
Loading