An Evaluation Framework for Emotional Companionship Capability of Dialogue Systems

ACL ARR 2026 January Submission10390 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: PQAEF, ECDBench1.0, Emotional Companionship Dialogue Systems, Evaluation Benchmark
Abstract: With the rapid development of Large Language Models, dialogue systems are shifting from information tools to emotional companions, heralding the era of Emotional Companionship Dialogue Systems (ECDs) that provide personalized emotional support for users. However, the field lacks systematic evaluation standards. To address this, we pioneered the design and implementation of the Four-Dimensional Capability Evaluation Framework (FDAEF), which hierarchically integrates ``Capability Layer → Task Layer (three-level) → Data Layer → Method Layer''. Then, we present ECDBench 1.0 , the inaugural ECD-specific benchmark developed under FDAEF. Through extensive evaluations of 30 mainstream models, we demonstrate that ECDBench 1.0 has excellent discriminant validity and can effectively quantify the differences in emotional companionship capabilities among models. Furthermore, the results reveal current models' shortcomings in deep emotional companionship, guiding future technological optimization and significantly aiding developers in enhancing ECDs’ user experience.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: Emotional Companionship Dialogue Systems (ECDs), Affective Computing, LLMs, Agent
Contribution Types: Reproduction study, Publicly available software and/or pre-trained models, Data resources, Data analysis, Position papers
Languages Studied: English, Chinese
Submission Number: 10390
Loading