Overview of the sixth dialog system technology challenge: DSTC6

Chiori Hori, Julien Perez, Ryuichiro Higashinaka, Takaaki Hori, Y-Lan Boureau, Michimasa Inaba, Yuiko Tsunomori, Tetsuro Takahashi, Koichiro Yoshino, Seokhwan Kim

2019 (modified: 08 Apr 2025)Comput. Speech Lang. 2019Readers: Everyone

Abstract: Highlights • DSTC6: Dialog Challenge to improve performance of end-to-end dialog systems using Neural Network models and dialog breakdown detection. • Track 1, End-to-End Goal Oriented Dialog Learning: selection of the best system response. Hybrid Code Network and Memory Network were the best models. • Track 2, End-to-End Conversation Modeling: system response generation. 78.5% of the automatically generated sentences were rated as acceptable responses by humans. • Track 3, Dialogue Breakdown Detection. The submitted systems performed as well as humans in detecting dialog breakdown, for both English and Japanese data-sets. Abstract This paper describes the experimental setups and the evaluation results of the sixth Dialog System Technology Challenges (DSTC6) aiming to develop end-to-end dialogue systems. Neural network models have become a recent focus of investigation in dialogue technologies. Previous models required training data to be manually annotated with word meanings and dialogue states, but end-to-end neural network dialogue systems learn to directly output natural-language system responses without needing training data to be manually annotated. Thus, this approach allows us to scale up the size of training data and cover more dialog domains. In addition, dialogue systems require a meta-function to avoid deploying inappropriate responses generated by themselves. To challenge such issues, the DSTC6 consists of three tracks, (1). End-to-End Goal Oriented dialogue Learning to select system responses, (2). End-to-End Conversation Modeling to generate system responses using Natural Language Generation (NLG) and (3). Dialogue Breakdown Detection. Since each domain has different issues to be addressed to develop dialogue systems, we targeted restaurant retrieval dialogues to fill slot-value in Track 1, customer services on Twitter by combining goal-oriented dialogues and ChitChat in Track 2 and human-machine dialogue data for ChitChat in Track 3. DSTC6 had 141 people declaring their interests and 23 teams submitted their final results. 18 scientific papers were presented in the wrap-up workshop. We find the blending end-to-end trainable models associated to meaningful prior knowledge performs the best for the restaurant retrieval for Track 1. Indeed, Hybrid Code Network and Memory Network have been the best models for this task. In Track 2, 78.5% of the system responses automatically generated by the best system were rated better than acceptable by humans and this achieves 89% of the number of the human responses rated in the same class. In Track3, the dialogue breakdown detection technologies performed as well as human agreements, in both data-sets of English and Japanese. Previous article in issue Next article in issue

0 Replies