[AML]Orchestrating Multi-Agent Alignment for Large Language Models Through External Memory and Adaptive Reasoning

THU 2024 Winter AML Submission11 Authors

11 Dec 2024 (modified: 18 Dec 2024)THU 2024 Winter AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Alignment, PRM, External Memory Integration, LLM
TL;DR: This paper proposes a multi-agent framework combining reasoning, evaluation, and regeneration agents, optimized through PPO and LoRA, to enhance the accuracy and transparency of large language model-based reasoning tasks.
Abstract: This paper presents a novel multi-agent framework designed to enhance the reasoning capabilities of large language models (LLMs). The proposed system integrates three core agents: a Reasoning Agent, an Evaluation Agent, and a Regeneration Agent, which collaborate in an iterative process of reasoning, evaluation, and refinement. The system employs Proximal Policy Optimization (PPO) for training, ensuring stable and efficient optimization of the agents' policies, while the use of LoRA adapters enables lightweight and effective parameter updates. A critical component of the system is the ExternalMemory module, which facilitates communication between agents and supports efficient fine-tuning. This synergy of autonomous decision-making, external memory integration, and adaptive quality control reduces the burden of manual oversight, while substantially enhancing the clarity, rigor, and reliability of the model’s inferences. The framework is evaluated, demonstrating improvements in reasoning accuracy and model robustness compared to existing approaches. The results highlight the potential of multi-agent systems in enhancing the transparency, efficiency, and scalability of LLM-based reasoning tasks, with implications for applications in various domains such as education, research, and industry.
Submission Number: 11
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview