Keywords: Reasoning models; Chain-of-thought reasoning (CoT); Intermediate answers; Overthinking
TL;DR: Reasoning models with long chain-of-thought encode strong signals about the correctness of intermediate answers in model's hidden states, and we can use it for early exit.
Abstract: Reasoning models have achieved remarkable performance on tasks like math and logical reasoning thanks to their ability to search during reasoning. However, they still suffer from \textit{overthinking}, often performing unnecessary reasoning steps even after reaching the correct answer. This raises the question: \textit{can models evaluate the correctness of their intermediate answers during reasoning?}
In this work, we study whether reasoning models encode information about answer correctness through probing the model's hidden states. The resulting probe can verify intermediate answers with high accuracy and produces highly calibrated scores. Additionally, we find models' hidden states encode correctness of future answers, enabling ealy prediction of the correctness before the intermediate answer is fully formulated.
We then use the probe as a verifier to decide whether to exit reasoning at intermediate answers during inference, reducing the number of inference tokens by 24\% without compromising performance. These findings confirm that reasoning models do encode a notion of correctness yet fail to exploit it, revealing substantial untapped potential to enhance their efficiency.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1704
Loading