BiasGUARRD: Enhancing Fairness and Reliability in LLM Conflict Resolution Through Agentic Debiasing

Erica Wang; Shrujana S Kunnam; Sreeyutha Ratala

BiasGUARRD: Enhancing Fairness and Reliability in LLM Conflict Resolution Through Agentic Debiasing

Erica Wang, Shrujana S Kunnam, Sreeyutha Ratala

Published: 01 Jul 2025, Last Modified: 10 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Foundational Models, Bias Mitigation, Equitable, Reliable, Sustainable, Cognitive Bias, Agentic Frameworks, Fairness

TL;DR: We introduce BiasGUARRD, a multi-agent framework that detects and mitigates cognitive biases in LLMs during conflict resolution, reducing judgment shifts by up to 63.3% and promoting fairer decisions across high-stakes, socially grounded scenarios.

Abstract: As foundation models (FMs) are increasingly deployed in socially sensitive domains, ensuring their reliability in high-stakes decision-making is essential. Large language models (LLMs), in particular, often mirror human cognitive biases, systematic deviations from rational judgment, that can lead to unfair or inconsistent outcomes. While prior work has identified such biases, we uniquely examine their manifestation in interpersonal conflict resolution by analyzing the effects of biased prompt phrasing on model responses and evaluating strategies for mitigation. We (1) present a modular benchmark of 100 human-annotated, neutral interpersonal conflict scenarios across four domains: family, workplace, community, and friendship, to which we systematically inject four cognitive biases: affective framing, halo effect, framing effect, and serial order bias. We (2) find that LLMs shift their judgment relative to the neutral baseline in response to biased prompt variants $31$\%-$79$\% of the time. We (3) introduce BiasGUARRD (Bias Governance Using Agents for Reliable Reasoning-based Decision-making), a multi-agent framework that reduces judgment inconsistency of LLMs when presented with biased scenarios by up to $63.3$\%. This architecture detects biases and dynamically applies targeted interventions to guide models toward more equitable decision-making. Our work offers a diagnostic framework for identifying and addressing unreliable behaviors in FMs, contributing to more trustworthy deployment in socially grounded applications. The code is available at https://github.com/SreeyuR/BiasGUARRD/tree/main.

Submission Number: 135

Loading