Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias

ICLR 2026 Conference Submission24924 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Question Answering, Bias
Abstract: Large language models (LLMs) are highly vulnerable to input confirmation bias. When a prompt implies a preferred answer, models often reinforce that bias rather than explore alternatives. This phenomenon remains underexplored, yet it is already harmful in base models and poses an even greater risk in multi-agent debate, where echo chambers reinforce bias instead of correction. We introduce \emph{\textbf{M}ixture \textbf{o}f \textbf{La}tent \textbf{C}oncept \textbf{E}xperts (\textbf{MoLaCE})}, a framework that directly addresses confirmation bias through a mixture of hidden experts. Our method identifies a latent direction in the model internal representations that reflects confirmation bias, instantiates experts as different activation strengths along this direction, and employs a gating mechanism to adaptively mix their predictions. This design enables a single LLM to emulate the benefits of debate internally while remaining lightweight and scalable. It can also be integrated into multi-agent debate frameworks to diversify perspectives and reduce correlated errors. We empirically show that it consistently reduces confirmation bias, improves robustness, and matches or surpasses multi-agent debate while requiring only a fraction of the computation.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24924
Loading