C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue EvaluationDownload PDF

Anonymous

Published: 23 May 2023, Last Modified: 12 Mar 2024DialDoc 2023 PosterReaders: Everyone
Keywords: Dialogue Evaluation, Mutual Information, Text Evaluation, Natural Language Processing
TL;DR: An unsupervised, reference-free and conditonal mutual information based dialogue evaluation metric that captures the interaction between the system and the user under a given hypothesis.
Abstract: Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 60.5% higher Spearman correlation on average for the FED evaluation metric.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2306.15245/code)
0 Replies

Loading