Competitive Multi-Agent Delegation for  LLM Reasoning

Competitive Multi-Agent Delegation for LLM Reasoning

ICLR 2026 Conference Submission20088 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, Multi-Agent Learning, Mechanism design, Nash equilibrium

Abstract: Large Language Models (LLMs) have shown impressive capabilities in natural language generation, yet they remain limited in complex and multi-step reasoning. We propose COMMAND: COMpetitive Multi-AgeNt Delegation, a framework where a principal LLM assigns tasks to multiple agent LLMs. Agents compete in an environment where utilities depend on both their internal confidence and the principal’s evaluation, incentivizing answers that are higher-quality and better aligned with the principal. We establish theoretical guarantees demonstrating that, under fair comparison, multi-agent systems such as COMMAND provably outperform their single-agent counterparts. Moreover, each agent, via online learning, achieves sublinear regret and its average policy will converge to a Nash equilibrium. Empirical evaluations on multiple benchmarks demonstrate that COMMAND yields significant improvements in factual accuracy.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 20088

Loading