Open-Domain Dialog Evaluation Using Follow-Ups Likelihood

Maxime De Bruyn, Ehsan Lotfi, Jeska Buhmann, Walter Daelemans

Published: 2022, Last Modified: 19 May 2025COLING 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automatic evaluation of open-domain dialogs remains an unsolved problem. Existing methods do not correlate strongly with human annotations. In this paper, we present a new automated evaluation method based on the use of follow-ups. We measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g. not really relevant here, what are you trying to say?). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.