DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics

DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics

ACL ARR 2025 February Submission6618 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs) by having multiple agents discuss solutions to a problem over several rounds of debate. However, models often generate incorrect yet confident-sounding responses, which can mislead others. This issue arises partly because agents do not consider how confident their peers are. To address this, we propose DebUnc, a debate framework that uses uncertainty metrics to assess agent confidence. Confidence is then conveyed through a modified attention mechanism that adjusts token weights, or through textual prompts. Evaluations across benchmarks show that attention-based methods are particularly effective and that performance continues to improve as uncertainty estimation becomes more reliable.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Dialogue and Interactive Systems, Interpretability and Analysis of Models for NLP, Language Modeling,

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 6618

Loading