Consensus Matrix: A Role-Specialized Multi-Agent Framework for Structured Collaborative Decision-Making in Agentic Visual Media Workflows
Keywords: multi-agent systems, consensus learning, reinforcement learning, visual media evaluation, clinical decision support, Kendall's W, role specialization, agentic AI
TL;DR: A role-specialized multi-agent framework that quantifies inter-agent agreement via Kendall's W and uses it to drive adaptive, evidence-grounded deliberation in agentic visual media and clinical workflows.
Abstract: Agentic visual media workflows---spanning multi-agent video quality assessment, creative content evaluation, and clinical decision support---demand structured collaboration among role-specialized agents, yet existing multi-agent systems rely on simple voting or averaging and offer no principled measure of agreement quality. We present the Consensus Matrix, a general role-specialized multi-agent framework that quantifies and optimizes inter-agent agreement in complex, high-stakes workflows. Our framework instantiates N role-specialized LLM agents, each producing a structured opinion comprising preference scores, a confidence estimate, role-specific concerns, and a grounded evidence chain. These outputs populate a shared consensus matrix; agreement is measured via Kendall's coefficient of concordance W, which drives an adaptive feedback loop: when W falls below threshold, targeted feedback is directed at the most discordant agents. Unlike systems with fixed aggregation policies, our full system includes a reinforcement learning (RL) coordinator that learns to select round-to-round interaction strategies, accelerating convergence while preserving decision traceability; the coordinator is modular and can be omitted when computational budget is limited. We instantiate and validate the framework on oncology MDT deliberation---a demanding testbed where role diversity, evidence grounding, and consensus quality are all clinically critical. Across five medical benchmarks (MedQA, PubMedQA, DDXPlus, MedBullets, SymCat), our system achieves 87.5% average accuracy, outperforming the strongest baseline by 3.7 percentage points, with a mean concordance of W=0.823 and clinician appropriateness ratings of 8.9/10.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 8
Loading