Supervised Chain of Thought

22 Sept 2024 (modified: 03 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chain of Thought, LLMs, Prompting, CoT, Model Analysis
TL;DR: Chain of Thought requires human supervision to reduce the complexity of the prompt space, enabling effective convergence to the optimal solution.
Abstract: Large Language Models (LLMs) have revolutionized the field of natural language processing and hold significant promise for advancements in Artificial Intelligence. However, the backbone architecture of most mainstream LLMs, the Transformer, has inherent limitations regarding computational depth, making them theoretically incapable of solving many reasoning tasks that require increasing depth. Chain of Thought (CoT) techniques, however, have been shown to mitigate these architectural limitations, as demonstrated by several theoretical works, offering a viable approach to solving complex reasoning tasks that were previously out of reach. Despite its successes, CoT and its variants (such as Tree of Thought, Graph of Thought, etc.) follow a one-prompt-for-all-tasks approach. Specifically, they rely on a single prompt structure (e.g., "think step by step") for a wide range of tasks, from counting to sorting, and from solving mathematical problems to tackling algorithmic challenges. This creates significant challenges for the model to generate the correct steps template for different tasks, as it requires searching in large prompt template space. In this work, we build on previous theoretical analyses of CoT to demonstrate how the "one-prompt-for-all-tasks" template can negatively impact the computability of LLMs. We divide the solution space into prompt space and answer space, showing that the CoT process requires task-specific supervision to accurately navigate the prompt space and achieve optimal performance. Through experiments with the latest LLMs, we reveal a significant gap in reasoning ability when supervision is applied versus when it is not. Our aim is to provide insights into the mechanisms behind CoT and to inspire the effective design of CoT variants. Additionally, we highlight the key limitations of traditional ``unsupervised'' prompting approaches, suggesting the need for more nuanced, task-specific "supervised" CoT for effective reasoning with LLMs.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2701
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview