CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Open-Domain ConversationDownload PDF

Anonymous

17 Dec 2021 (modified: 05 May 2023)ACL ARR 2021 December Blind SubmissionReaders: Everyone
Abstract: Recently, the personification and empathy capabilities of dialogue systems have received extensive attention from researchers. Although it is straightforward for humans to express themselves personally and empathically, this is highly difficult for dialogue systems since training data do not provide personalities or empathy knowledge. In this paper, we propose CPED, a large-scale Chinese personalized and emotional dialogue dataset, which consists of multisource knowledge related to empathy and personal characteristic. This knowledge covers 13 emotions, gender, Big Five personality traits, 19 dialogue acts and other knowledge. CPED contains more than 12K dialogues of 392 speakers from 40 TV shows. We also provide several strong baselines for open-domain conversation generation. The results show that explicitly infusing personalized knowledge and emotional information improves the personification level and empathy ability of dialogue systems, but the infusion method needs to be further studied. The dataset and baselines will be released on https://github.com/***/CPED.
Paper Type: long
Consent To Share Data: yes
0 Replies

Loading