Can AI Deliberate? Evaluating Deliberative Quality and Belief Revision in Multi-Agent LLMs

Can AI Deliberate? Evaluating Deliberative Quality and Belief Revision in Multi-Agent LLMs

05 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI; Deliberation

Abstract: Can large language models (LLMs) deliberate with quality across varying discussion structures? This study investigates this question by examining how structural norms and attitude certainty shape the deliberative quality and belief dynamics of multi-agent LLM dialogues. We implemented a 2×2 factorial design (structured vs. unstructured × high vs. low certainty) in which role-conditioned LLM agents engaged in multi-round debates on the commercial use of AI-generated art. Dialogue transcripts were evaluated using the Deliberative Quality Index (DQI) and stance-flow analysis to capture both static deliberative quality and dynamic belief revision. Results show that structure enhanced civility and coherence, while certainty improved justification and interactivity. The combination of structured interaction and high certainty produced the strongest overall deliberative quality, whereas unstructured low-certainty dialogues consistently underperformed. Across all conditions, however, constructive solution-building remained limited, and LLMs failed to replicate the nuanced facilitative role of human moderators. These findings suggest that while LLMs can approximate key features of deliberation under controlled conditions, further advances—such as memory and planning modules or hybrid human–AI facilitation—are needed to move beyond procedural compliance toward genuinely constructive deliberation.

Supplementary Material: zip

Submission Number: 86

Loading