Keywords: Large language models, Multi-agent debate, Evaluation
TL;DR: This paper presents the first study of competitive incentives in multi-agent debates, quantifying over-competition behaviors of state-of-the-art LLMs.
Abstract: LLM-based multi-agent systems demonstrate great potential for tackling complex problems, but how competition shapes their behavior remains underexplored.
This paper investigates the over-competition in multi-agent debate, where agents under extreme pressure exhibit unreliable, harmful behaviors that undermine both collaboration and task performance.
To study this phenomenon, we propose HATE, the Hunger Game Debate, a novel experimental framework that simulates debates under a zero-sum competition arena.
Our experiments, conducted across a range of LLMs and tasks, reveal that competitive pressure significantly stimulates over-competition behaviors and degrades task performance, causing discussions to derail.
We further explore the impact of environmental feedback by adding variants of judges, indicating that objective, task-focused feedback effectively mitigates the over-competition behaviors.
We also probe the post-hoc kindness of LLMs and form a leaderboard to characterize top LLMs, providing
insights for understanding and governing the emergent social dynamics of AI community.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 24759
Loading