BIASGUARRD: Enhancing Fairness and Reliability in LLM Conflict Resolution Through Agentic Debiasing

Published: 01 Jul 2025, Last Modified: 01 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Foundational Models, Bias Mitigation, Equitable, Reliable, Sustainable, Cognitive Bias, Agentic Frameworks, Fairness
TL;DR: We introduce BIASGUARRD, a multi-agent framework that detects and mitigates cognitive biases in LLMs during conflict resolution, reducing judgment shifts by up to 63.3% and promoting fairer decisions across high-stakes, socially grounded scenarios.
Abstract: As foundation models (FMs) are increasingly deployed in socially sensitive domains, such as human-centered decision-making across professional and social contexts, ensuring their reliability in navigating high-stakes situations is essential. Large language models (LLMs), in particular, often mirror human cognitive biases—systematic deviations from rational judgment—that can lead to unfair or inconsistent outcomes. While prior work has identified cognitive biases in LLMs, we uniquely examine how these biases manifest in the context of interpersonal conflict resolution. Specifically, we investigate how biased prompt phrasing influences model responses and how such biases can be mitigated. We (1) present a novel, modular benchmark of 100 human-annotated, neutral interpersonal conflict scenarios spanning family, workplace, community, and friendship domains. To evaluate model vulnerability, we systematically inject four cognitive biases: affective framing, halo effect, framing effect, and serial order bias. We (2) find that during evaluation, LLMs shift their judgment relative to the neutral baseline in response to biased prompt variants 31%-79% of the time. We (3) introduce a novel multi-agent framework BIAS- GUARRD that outperforms existing mitigation strategies by reducing judgment inconsistency across LLMs, even when presented with biased scenarios, by up to 63.3%. BIASGUARRD (Bias Governance using Debiasing Agents and Reliable Reasoning-based Decision-making) mitigates reasoning flaws in LLMs by detecting biases and dynamically applying targeted interventions to guide models toward more equitable decision-making. Our work offers a diagnostic framework for identifying and address- ing unreliable behaviors in FMs, contributing to more trustworthy deployment in socially grounded applications. The code is available at https://anonymous.4open.science/ r/BiasGUARRD-060F/README.md.
Submission Number: 135
Loading