PRODIGy: a PROfile-based DIalogue Generation datasetDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: Providing dialogue agents with a profile representation can improve their consistency and coherence, leading to better conversations. However, current profile-based dialogue datasets for training such agents contain either explicit profile representations that are simple and dialogue-specific, or implicit representations that are difficult to collect. In this work, we introduce the PRODIGy (PROfile-based DIalogue Generation) dataset, which brings diverse representations together, providing a more comprehensive profile dimension set for each speaker. This resource comprises more than 20k dialogues, sourced from movie scripts, aligned with speaker representations such as communication style, biography, personality and gender. Initial experiments with diverse baselines show that providing generative language models with these aspects of a profile, both separately and jointly, enhances models' performance. This improvement holds true in both in-domain and cross-domain settings, for both fine-tuned and instruction-based LLMs.
Paper Type: long
Research Area: Dialogue and Interactive Systems
Contribution Types: Data resources
Languages Studied: English
0 Replies

Loading