BotsTalk: Machine-Sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets
Abstract: Previous work in open-domain chatbots has introduced dialogue corpora and tasks that aim to inject dialogue systems different communicative skills such as being personable, knowledgeable and empathetic. With the advent of conversational agents grounded to specific skills, a new challenge in open-domain chatbots has been posed: A good open-domain chatbot should retain a well-rounded set of skills and seamlessly blend them into a conversation. To this end, a new dialogue dataset Blended Skill Talk is collected via crowdsourcing and commonly used as a benchmark for multi-skill dialogue generation. However, such data construction approach requires labor intensive manual annotation, which severely limits their utility on large-scale learning. In this work, we propose BotsTalk, a novel machine-sourced framework, where several agents participate in a conversation to automatically annotate multi-skill dialogues. We then present Blended Skill BotsTalk (BS$\mathbb{B}$T), a large-scale multi-skill dialogue dataset of 200K conversations. Experimental results show that our dataset can be effectively used as training data for multi-skill dialogue systems which require an understanding of both skill blending and grounding. We also demonstrate the dataset is orthogonally applicable to diverse learning schemes such as fine-tuning and multi-task learning.
0 Replies