Dialogue as Uncertainty Reduction: A Judge-Free Metric for Multi-Turn Dialogue Evaluation

Dialogue as Uncertainty Reduction: A Judge-Free Metric for Multi-Turn Dialogue Evaluation

ACL ARR 2026 January Submission987 Authors

26 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: dialogue evaluation, information gain, multi-turn dialogue, uncertainty reduction, information-seeking dialogue, embedding-based metrics, LLM evaluation, efficient evaluation

Abstract: Evaluating multi-turn dialogue systems remains challenging, as dialogue quality depends on how effectively an agent accumulates relevant information across turns. In this work, we propose a fast, information-theoretic metric for evaluating multi-turn dialogue based on uncertainty reduction over the course of a conversation in embedding space. Our approach admits a tractable Gaussian approximation and enjoys desirable theoretical properties, including monotonicity, telescoping over turns, and submodularity. Unlike recent approaches that rely on large language models as judges, our method is fully reference-free (no ground-truth answers, no gold references, no human annotations at evaluation time), deterministic, and computationally efficient. We show that the proposed metric remains effective even when instantiated with extremely lightweight embedding models under CPU-only execution, indicating that the evaluative signal does not require large model capacity or autoregressive inference. We evaluate the proposed metric on MT-Bench and Chatbot Arena, showing competitive and, on MT-Bench, improved agreement with human preferences compared to several LLM-as-a-judge baselines.

Paper Type: Long

Research Area: Dialogue and Interactive Systems

Research Area Keywords: evaluation and metrics

Contribution Types: Approaches low compute settings-efficiency, Theory

Languages Studied: English

Submission Number: 987

Loading