Keywords: Multi-Agent System, Bias Evaluation
Abstract: Recent advances in large language models (LLMs) have led to significant progress
in mitigating social biases at the individual model level. However, a core vulnerability persists: small, stochastic biases can be amplified through multi-step
interaction, leading to skewed system-level outcomes. A promising, yet unverified,
hypothesis is that the architectural diversity of multi-agent systems (MAS)—where
LLM-based agents with different roles and perspectives interact—could naturally
mitigate this amplification. In this work, we rigorously test this hypothesis and
investigate the phenomenon of bias amplification in MAS across sensitive attributes,
including gender, age, and race. We introduce Discrim-Eval-Open, an open-ended,
multi-option benchmark designed to measure system-level bias and bypass the performative neutrality of modern LLMs. We further propose novel metrics, including
an adaptation of the Gini coefficient, to quantify the extremity of system-wide
outputs. Our experiments reveal that iterative bias amplification is a pervasive
issue that is not solved by MAS architectures. This amplification persists across
various configurations, spanning agent roles, communication topologies, iteration
depths, and model types, even when individual agents exhibit minimal bias in
isolation. Moreover, we observe a systemic tendency to favor younger age groups,
females, and Black communities. Finally, we demonstrate that even the inclusion
of objective, neutral inputs can exacerbate bias amplification, exposing a critical
vulnerability in system-level robustness. These findings challenge the assumption
that architectural complexity alone fosters equity, underscoring the urgent need to
address the fundamental dynamics of bias amplification within LLM-based MAS.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 5704
Loading