Keywords: Large reasoning model, Decoding, Reasoning
TL;DR: This paper presents contrastive thinking decoding that improves answers and mitigates the disagreement issue between thinking and answer by token level extrapolation between a primary thinking trace and a noisy reference.
Abstract: Large reasoning models (LRMs) expose an explicit thinking phase prior to the answer. While recent work has focused on the optimization of the thinking phase, answer phase, given a thinking trace, remains under-explored. This paper investigates the behavior of the answer phase. First, the answer can diverge from the thinking trace even when the trace already contains the correct solution, and the converse can also occur. Second, budgeted thinking alters the answer in non obvious ways. Small budgets trigger extra reasoning in the answer, and large budgets move verification into thinking yet drift can remain. Third, complete thinking prompts that live only inside the thinking block steer the answer pattern and provide practical control of answer behavior. Motivated by these observations, we propose Contrastive Thinking Decoding (CTD), a test-time logit correction method that explicitly targets answer phase alignment. Unlike prior contrastive decoding, which contrasts outputs from a strong model against an auxiliary weaker model, CTD operates within a single model by contrasting the primary thinking trace with a deliberately perturbed noisy trace. This contrast steers token-level decoding in the answer phase, requires no additional training, and preserves budget control. Across standard math reasoning (e.g., MATH500, AIME'24/'25) and code benchmarks, CTD achieves higher accuracy at similar or lower token counts and reduces mismatch between the provided thinking and the final answer.
Primary Area: generative models
Submission Number: 10633
Loading