Keywords: Large Language Models, Foundational Models, Bias Mitigation, Equitable, Reliable, Sustainable, Cognitive Bias, Agentic Frameworks, Fairness
TL;DR: We introduce BIASGUARRD, a multi-agent framework that detects and mitigates cognitive biases in LLMs during conflict resolution, reducing judgment shifts by up to 63.3% and promoting fairer decisions across high-stakes, socially grounded scenarios.
Abstract: As foundation models (FMs) are increasingly
deployed in socially sensitive domains, such
as human-centered decision-making across
professional and social contexts, ensuring their
reliability in navigating high-stakes situations
is essential. Large language models (LLMs),
in particular, often mirror human cognitive
biases—systematic deviations from rational
judgment—that can lead to unfair or inconsistent
outcomes. While prior work has identified
cognitive biases in LLMs, we uniquely examine
how these biases manifest in the context of
interpersonal conflict resolution. Specifically,
we investigate how biased prompt phrasing
influences model responses and how such
biases can be mitigated. We (1) present a novel,
modular benchmark of 100 human-annotated,
neutral interpersonal conflict scenarios spanning
family, workplace, community, and friendship
domains. To evaluate model vulnerability,
we systematically inject four cognitive biases:
affective framing, halo effect, framing effect,
and serial order bias. We (2) find that during
evaluation, LLMs shift their judgment relative
to the neutral baseline in response to biased
prompt variants 31%-79% of the time. We (3)
introduce a novel multi-agent framework BIAS-
GUARRD that outperforms existing mitigation
strategies by reducing judgment inconsistency
across LLMs, even when presented with biased
scenarios, by up to 63.3%. BIASGUARRD
(Bias Governance using Debiasing Agents and
Reliable Reasoning-based Decision-making)
mitigates reasoning flaws in LLMs by detecting
biases and dynamically applying targeted
interventions to guide models toward more
equitable decision-making. Our work offers a
diagnostic framework for identifying and address-
ing unreliable behaviors in FMs, contributing
to more trustworthy deployment in socially
grounded applications. The code is available at
https://anonymous.4open.science/
r/BiasGUARRD-060F/README.md.
Submission Number: 135
Loading