OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning

OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning

ACL ARR 2025 July Submission1059 Authors

29 Jul 2025 (modified: 21 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have shown remarkable reasoning capabilities in mathematical and scientific tasks. To enhance complex reasoning, multi-agent systems have been proposed to harness the collective intelligence of LLM agents. However, existing collaboration structures are either predefined or rely on majority voting or round-table debates, which can suppress correct but less dominant agent contributions. Recent approaches model multi-agent systems as graph networks but optimize purely for agent performance, neglecting the quality of interactions. We hypothesize that effective agent communication is crucial for multi-agent reasoning and that debating quality plays a significant role. To address this, we propose OptAgent, a multi-agent verbal reinforcement learning algorithm that dynamically constructs and refines multi-agent collaboration structures. Our method defines action spaces and a feedback mechanism that evaluates communication robustness and coherence throughout the debate. The final decision is achieved through a majority vote over all the agents. We assess OptAgent on various reasoning tasks, including mathematical reasoning, creative writing, scientific reasoning, and numerical sorting. Results demonstrate that our approach significantly outperforms single-agent prompting methods and state-of-the-art multi-agent frameworks on diverse tasks.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: LLM/AI agents, prompting, applications

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Previous URL: https://openreview.net/forum?id=urwJvRvAMV

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).

Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Limitations

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 4.1

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Section 4.1

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 4.1

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B4 Elaboration: Ethics

B5 Documentation Of Artifacts: Yes

B5 Elaboration: Section 4.1

B6 Statistics For Data: Yes

B6 Elaboration: Section 4.1, Appendix

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 4.1, Appendix C

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: Section 4.1

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 4.1

C4 Parameters For Packages: Yes

C4 Elaboration: Section 4.2, Appendix

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Appendix H

Author Submission Checklist: yes

Submission Number: 1059

Loading