Rethinking Chain-of-Thought from the Perspective of Self-Training

Zongqian Wu; Baoduo Xu; Ruochen Cui; Mengmeng Zhan; Xiaofeng Zhu; Lei Feng

Rethinking Chain-of-Thought from the Perspective of Self-Training

Zongqian Wu, Baoduo Xu, Ruochen Cui, Mengmeng Zhan, Xiaofeng Zhu, Lei Feng

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Chain-of-thought (CoT) reasoning has emerged as an effective approach for activating latent capabilities in LLMs. Interestingly, we observe that both CoT reasoning and self-training share the core objective: iteratively leveraging model-generated information to progressively reduce prediction uncertainty. Building on this insight, we propose a novel CoT framework to improve reasoning performance. Our framework integrates two key components: (i) a task-specific prompt module that optimizes the initial reasoning process, and (ii) an adaptive reasoning iteration module that dynamically refines the reasoning process and addresses the limitations of previous CoT approaches, i.e., over-reasoning and high similarity between consecutive reasoning iterations. Extensive experiments show that the proposed method achieves significant advantages in both performance and computational efficiency. Our code is available at: https://github.com/zongqianwu/ST-COT.

Lay Summary: Large language models (LLMs), like ChatGPT, have shown impressive abilities in answering complex questions. One popular method to improve their thinking process is called chain-of-thought reasoning, where the model is encouraged to think step-by-step before giving a final answer. In this work, we find that this thinking process is similar to how humans learn by reviewing their own answers and improving over time. Based on this idea, we propose a new approach that helps LLMs think more effectively. It includes a smart way to guide the model at the start and a strategy to improve its thinking with each step.

Link To Code: https://github.com/zongqianwu/ST-COT

Primary Area: Deep Learning->Large Language Models

Keywords: Chain-of-Thought, Self-Training, Reasoning, LLM

Submission Number: 3053

Loading