Abstract: A systematic, reliable, and low-cost evaluation of Conversational Recommender Systems (CRSs) remains an open challenge.
Existing automatic CRS evaluation methods are proven insufficient for evaluating the dynamic nature of recommendation conversations.
This work proposes FACE: a Fine-grained, Aspect-based Conversation Evaluation method that provides evaluation scores for diverse turn and dialogue level qualities of recommendation conversations.
FACE is reference-free and shows strong correlation with human judgments, achieving system correlation of 0.9 and turn/dialogue level of 0.5, outperforming state-of-the-art CRS evaluation methods by a large margin.
Additionally, unlike existing LLM-based methods that provide single uninterpretable scores, FACE provides insights into the system performance and enables identifying and locating problems within conversations.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Corpus Creation, Benchmarking, Automatic Evaluation, Evaluation Methodologies, Evaluation, Metrics
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 7971
Loading