Understanding Bias Reinforcement in LLM Agents Debate

Jihwan Oh; Minchan Jeong; Jongwoo Ko; Se-Young Yun

Understanding Bias Reinforcement in LLM Agents Debate

Jihwan Oh, Minchan Jeong, Jongwoo Ko, Se-Young Yun

Published: 01 May 2025, Last Modified: 14 Jul 2025ICML 2025 Retracted by AuthorsEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We show that Multi-Agent Debate (MAD) reinforces inherent biases due to strong self-consistency, and propose DReaMAD, which refines prior knowledge and promotes diverse reasoning, outperforming MAD in MetaNIM Arena.

Abstract: Large Language Models (LLMs) solve complex problems using training-free methods like prompt engineering and in-context learning, yet ensuring reasoning correctness remains challenging. While self-correction methods such as self-consistency and self-refinement aim to improve reliability, they often reinforce biases due to the lack of effective feedback mechanisms. Multi-Agent Debate (MAD) has emerged as an alternative, but we identify two key limitations: bias reinforcement, where debate amplifies model biases instead of correcting them, and lack of perspective diversity, as all agents share the same model and reasoning patterns, limiting true debate effectiveness. To systematically evaluate these issues, we introduce $\textit{MetaNIM Arena}$, a benchmark designed to assess LLMs in adversarial strategic decision-making, where dynamic interactions influence optimal decisions. To overcome MAD’s limitations, we propose $\texttt{\textbf{DReaMAD}}$ ($\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt), a novel framework that (1) refines LLMs’ strategic prior knowledge to improve reasoning quality and (2) promotes diverse viewpoints within a single model by systematically modifying prompts, reducing bias. Empirical results show that $\texttt{\textbf{DReaMAD}}$ significantly improves decision accuracy, reasoning diversity, and bias mitigation across multiple strategic tasks, establishing it as a more effective approach for LLM-based decision-making.

Lay Summary: Large language models (LLMs) such as GPT and Gemini can already answer questions, write stories, and even solve puzzles just by reading a few examples. Yet, they sometimes arrive at confident but wrong or biased conclusions. A popular fix is to let the model “think again” (self-correction) or to make two copies of the model argue with each other (multi-agent debate). Unfortunately, these approaches often repeat the same blind spots because every copy still reasons in the same way. Our research tackles this problem on two fronts. First, we built $\textit{MetaNIM Arena}$, a game-like testbed in which LLMs must out-plan an adversary whose moves keep changing. This setting makes it easy to spot mistakes and hidden biases in the models’ reasoning. Second, we introduce $\texttt{\textbf{DReaMAD}}$ ($\textbf{D}$iverse $\textbf{Rea}$soning via $\textbf{M}$ulti-$\textbf{A}$gent $\textbf{D}$ebate with Refined Prompt). Instead of cloning the same thinker, $\texttt{\textbf{DReaMAD}}$ rewrites the “role” each copy plays, encouraging genuinely different perspectives before the debate begins. It also gives each copy a quick lesson in strategic thinking, so their arguments are sharper and less biased. Across several challenging decision-making tasks, $\texttt{\textbf{DReaMAD}}$ made the models more accurate, produced a wider range of ideas, and cut down on biased answers.

Link To Code: https://github.com/ericoh929/MetaNIMArena.git

Primary Area: Deep Learning->Large Language Models

Keywords: Language Model, Multi-Agent Debate, Bias

Submission Number: 9281

Loading