Contrastive Thinking Decoding can Improve Answers for Reasoning Models

Taehyeon Kim; Youngsoo Jang; Hyunsoo Lee; Yu Jin Kim; Moontae Lee

Contrastive Thinking Decoding can Improve Answers for Reasoning Models

Taehyeon Kim, Youngsoo Jang, Hyunsoo Lee, Yu Jin Kim, Moontae Lee

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large reasoning model, Decoding, Reasoning

TL;DR: This paper presents contrastive thinking decoding that improves answers and mitigates the disagreement issue between thinking and answer by token level extrapolation between a primary thinking trace and a noisy reference.

Abstract: Large reasoning models (LRMs) expose an explicit thinking phase prior to the answer. While recent work has focused on the optimization of the thinking phase, answer phase, given a thinking trace, remains under-explored. This paper investigates the behavior of the answer phase. First, the answer can diverge from the thinking trace even when the trace already contains the correct solution, and the converse can also occur. Second, budgeted thinking alters the answer in non obvious ways. Small budgets trigger extra reasoning in the answer, and large budgets move verification into thinking yet drift can remain. Third, complete thinking prompts that live only inside the thinking block steer the answer pattern and provide practical control of answer behavior. Motivated by these observations, we propose Contrastive Thinking Decoding (CTD), a test-time logit correction method that explicitly targets answer phase alignment. Unlike prior contrastive decoding, which contrasts outputs from a strong model against an auxiliary weaker model, CTD operates within a single model by contrasting the primary thinking trace with a deliberately perturbed noisy trace. This contrast steers token-level decoding in the answer phase, requires no additional training, and preserves budget control. Across standard math reasoning (e.g., MATH500, AIME'24/'25) and code benchmarks, CTD achieves higher accuracy at similar or lower token counts and reduces mismatch between the provided thinking and the final answer.

Primary Area: generative models

Submission Number: 10633

Loading