Relevance in Dialogue: An empirical comparison of existing metrics, and a novel simple metric

Anonymous

Relevance in Dialogue: An empirical comparison of existing metrics, and a novel simple metric

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone

Abstract: In this work, we evaluate various existing dialogue relevance metrics, find strong dependencies on the dataset, often with poor correlation with human scores of relevance, and propose modifications to reduce data requirements while improving correlation. With these changes, our metric achieves a new state-of-the-art on the HUMOD dataset (Merdivan et al., 2020). We achieve this without fine-tuning, using only 3750 unannotated human dialogues and a single negative example. Despite these limitations, we demonstrate competitive performance on three datasets from different domains. Our code including our metric and data processing is open sourced.

0 Replies

Loading