Think When You Need: Self-Adaptive Chain-of-Thought Learning

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, CoT, efficient learning
Abstract: Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length suffer from hyperparameter sensitivity and limited generalizability, especially for fuzzy tasks where ground truth is unavailable. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Our methodology extends naturally to both verifiable tasks with definitive answers and fuzzy tasks requiring subjective evaluation. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10993
Loading