Think When You Need: Self-Adaptive Chain-of-Thought Learning

Junjie Yang; Ke Lin; XingYu

Think When You Need: Self-Adaptive Chain-of-Thought Learning

Junjie Yang, Ke Lin, XingYu

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, CoT, efficient learning

Abstract: Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length suffer from hyperparameter sensitivity and limited generalizability, especially for fuzzy tasks where ground truth is unavailable. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Our methodology extends naturally to both verifiable tasks with definitive answers and fuzzy tasks requiring subjective evaluation. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."

Primary Area: foundation or frontier models, including LLMs

Submission Number: 10993

Loading