Scalable Multi-Agent LLMs for Multi-Turn Conversation Bias Detection

ACL ARR 2026 January Submission6382 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, fairness, multi-agent, multi-turn evaluations
Abstract: As large language models (LLMs) increasingly participate in human–AI conversations, the need for robust and socially grounded conversation evaluation frameworks has become critical. This paper investigates whether a multi-agent LLM system using multi-turn adversarial conversations can serve as an efficient method for detecting bias in LLMs. We present a framework that orchestrates three roles (target, adversary, and scorer) under diverse normative fairness definitions. Evaluating five state-of-the-art LLMs reveals substantial variations in bias presence. Multi-turn evaluations expose biases missed by static single-turn tests, while human validation with the scorer (84.83\% agreement) confirms the scoring reliability. The framework advances reproducible, interaction-aware fairness auditing for responsible generative AI and supports broader goals of responsible innovation and human-aligned system evaluations.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: LLM agents; model bias/fairness evaluation; adversarial attacks/examples/training; evaluation methodologies;
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 6382
Loading