ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments

Ruolan Yang, Zitong Li, Haifeng Tang, Kenny Q. Zhu

Published: 2022, Last Modified: 19 May 2025ACL (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.