An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

ACL ARR 2025 May Submission4353 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: While multi-agent LLM systems show strong capabilities in various domains, they are highly vulnerable to adversarial and low-performing agents. To resolve this issue, in this paper, we introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring. We model the collaborative query-answering process as an iterative game, where the agents communicate and contribute to a final system output. Our system associates a credibility score that is used when aggregating the team outputs. The credibility scores are learned gradually based on the past contributions of each agent in query answering. Our experiments across multiple tasks and settings demonstrate our system’s effectiveness in mitigating adversarial influence and enhancing the resilience of multi-agent cooperation, even in the adversary-majority settings.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Multi-agent Language Models, Credibility Scoring, Adversary-Resistant Coordination, Contribution-Based Aggregation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

Submission Number: 4353

Loading