DEBATE, TRAIN, EVOLVE: Self‑Evolution of Language Model Reasoning

DEBATE, TRAIN, EVOLVE: Self‑Evolution of Language Model Reasoning

ACL ARR 2025 May Submission4528 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) have improved significantly in their reasoning through extensive training on massive datasets. However, relying solely on additional data for improvement is becoming increasingly impractical, highlighting the need for models to autonomously enhance their reasoning without external supervision. In this paper, we propose $\textbf{Debate, Train, Evolve (DTE)}$, a novel ground truth-free training framework that uses multi-agent debate traces to evolve a single language model. We also introduce a new prompting strategy $\textbf{Reflect-Critique-Refine}$, to improve debate quality by explicitly instructing agents to critique and refine their reasoning. Extensive evaluations on $\textbf{five}$ reasoning benchmarks with $\textbf{six}$ open-weight models show that our DTE framework achieve substantial improvements, with an average accuracy gain of $\textbf{8.92\%}$ on the challenging GSM-PLUS dataset. Furthermore, we observe strong cross-domain generalization, with an average accuracy gain of $\textbf{5.8\%}$ on all other benchmarks, suggesting that our method captures general reasoning capabilities.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: fine-tuning, continual learning, LLM/AI agents, prompting

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 4528

Loading