ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Self-awareness, i.e., the ability to assess and correct one's generation, is a fundamental aspect of human intelligence, making its replication in large language models (LLMs) an important yet challenging task. Previous works tackle this by employing extensive reinforcement learning or relying on large external verifiers. In this work, we propose Refine via Intrinsic Self-Verification (ReVISE), an efficient and effective framework that enables LLMs to self-correct their outputs through self-verification. The core idea of ReVISE is to enable LLMs to verify their reasoning processes and continually rethink reasoning trajectories based on its verification. To implement this efficiently, we introduce a structured curriculum based on preference learning. Specifically, as ReVISE involves two challenging tasks (i.e., self-verification and reasoning correction), we tackle each task sequentially using curriculum learning, collecting both failed and successful reasoning paths to construct preference pairs for efficient training. During inference, our approach enjoys natural test-time scaling by integrating self-verification and correction capabilities, further enhanced by our proposed confidence-aware decoding mechanism. Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves the reasoning performance of LLMs.
Lay Summary: Large language models (LLMs) can produce impressive answers, but they sometimes make small mistakes in their reasoning, which can affect the final result. This paper introduces ReVISE, a method that allows LLMs to check and improve their own reasoning during inference without external feedback or more supervision. ReVISE trains the model to spot when its reasoning might be unreliable and decide whether to keep going or change its answer. It uses a two-stage training process and decoding that is aware of confidence levels to guide these decisions. This method helps people do better at tasks that need you to think step-by-step, like maths and logic problems. It also makes LLMs more like self-correcting and trustworthy AI systems.
Link To Code: https://github.com/seunghyukoh/revise
Primary Area: Deep Learning->Large Language Models
Keywords: Self-correct, Test-time compute, Large Language Model, Self-verify, Self-awareness
Submission Number: 15061
Loading