Abstract: Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, i.e., over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments show that the proposed method achieves significant advantages in both performance and computational efficiency. Our code is available at: https://github.com/zongqianwu/ST-COT.
Lay Summary: Large language models (LLMs), like ChatGPT, have shown impressive abilities in answering complex questions. One popular method to improve their thinking process is called chain-of-thought reasoning, where the model is encouraged to think step-by-step before giving a final answer. In this work, we find that this thinking process is similar to how humans learn by reviewing their own answers and improving over time. Based on this idea, we propose a new approach that helps LLMs think more effectively. It includes a smart way to guide the model at the start and a strategy to improve its thinking with each step.
Link To Code: https://github.com/zongqianwu/ST-COT
Primary Area: Deep Learning->Large Language Models
Keywords: Chain-of-Thought, Self-Training, Reasoning, LLM
Submission Number: 3053
Loading