OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning

ACL ARR 2025 July Submission1059 Authors

29 Jul 2025 (modified: 21 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have shown remarkable reasoning capabilities in mathematical and scientific tasks. To enhance complex reasoning, multi-agent systems have been proposed to harness the collective intelligence of LLM agents. However, existing collaboration structures are either predefined or rely on majority voting or round-table debates, which can suppress correct but less dominant agent contributions. Recent approaches model multi-agent systems as graph networks but optimize purely for agent performance, neglecting the quality of interactions. We hypothesize that effective agent communication is crucial for multi-agent reasoning and that debating quality plays a significant role. To address this, we propose OptAgent, a multi-agent verbal reinforcement learning algorithm that dynamically constructs and refines multi-agent collaboration structures. Our method defines action spaces and a feedback mechanism that evaluates communication robustness and coherence throughout the debate. The final decision is achieved through a majority vote over all the agents. We assess OptAgent on various reasoning tasks, including mathematical reasoning, creative writing, scientific reasoning, and numerical sorting. Results demonstrate that our approach significantly outperforms single-agent prompting methods and state-of-the-art multi-agent frameworks on diverse tasks.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: LLM/AI agents, prompting, applications
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Previous URL: https://openreview.net/forum?id=urwJvRvAMV
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Limitations
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 4.1
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Section 4.1
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section 4.1
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B4 Elaboration: Ethics
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Section 4.1
B6 Statistics For Data: Yes
B6 Elaboration: Section 4.1, Appendix
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section 4.1, Appendix C
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4.1
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4.1
C4 Parameters For Packages: Yes
C4 Elaboration: Section 4.2, Appendix
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Appendix H
Author Submission Checklist: yes
Submission Number: 1059
Loading