ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments

Published: 01 Jan 2022, Last Modified: 19 May 2025ACL (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.
Loading