When More is Less: Understanding Chain-of-Thought Length in LLMs

Published: 05 Mar 2025, Last Modified: 20 Mar 2025Reasoning and Planning for LLMs @ ICLR2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chain of Thought Reasoning, Large Language Model, Overthinking
Abstract: Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs) by breaking complex tasks into smaller, manageable sub-tasks. Researchers have been exploring ways to guide models to generate more complex CoT processes to improve the reasoning ability of LLMs, such as long CoT and the test-time scaling law. However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy? In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases. To understand this phenomenon, we provide a piece of evidence that *longer reasoning processes are increasingly susceptible to noise.* We theoretically prove the existence of an optimal reasoning step number and derive a scaling law for this optimal CoT length based on model capability and task difficulty. Inspired by our theory, we propose length-aware majority voting to alleviate the effects of excessively long or short CoTs, which is verified on both synthetic and real-world datasets. Our findings highlight the critical need to calibrate CoT length to align with model capabilities and task demands, offering a principled framework for optimizing multi-step reasoning in LLMs.
Submission Number: 126
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview