Information Gap in Chain-of-Thought Induces Implicit Thinking that Fails in Length Generalization

Information Gap in Chain-of-Thought Induces Implicit Thinking that Fails in Length Generalization

ICLR 2026 Conference Submission19828 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Chain-of-Thought, reasoning, generalization, large language models, synthetic math dataset, probing

Abstract: Recent work reveals that Chain-of-Though may not faithfully reflect the model’s actual reasoning, as their semantics can diverge from their underlying “implicit thoughts”. In this work, using a synthetic dataset with controllable complexity, we find signs of implicit thinking in models after supervised finetuning (SFT) on CoT rationales, that is, the models have internally identified all necessary variables to be solved before generating the actual CoT. This implicit thinking ability sharply degrades as the required CoT steps exceed those seen during training, hence preventing the model from generalizing to more complex problems. To understand why implicit thinking emerges during SFT on explicit CoT rationales, we first define “information gap” within a CoT based on the ratio of unexplored actions and all admissible actions at each state. We hypothesize that a large information gap (a lot of admissible but unexplored actions) force LLMs to justify the actions explored in golden CoT by looking for clues in its internal representation, hence leading to implicit thinking. We benchmark 4 types of CoT, each based on a different graph traversal heuristic, and observe a positive correlation between the magnitude of information gap in CoTs and the implicit thinking ability in models finetuned on these CoTs. We further support this hypothesis by showing that actively reducing the information gap by including multiple CoT trajectories per question can reduce implicit thinking and enhance generalization to more complex questions. Overall, our findings suggest rethinking the role of CoT in LLM reasoning and understanding the necessary condition of learning generalizable CoT.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19828

Loading