One cannot stand for everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue SystemsDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: User simulators, Task-oriented Dialogue Systems.
TL;DR: We propose to leverage multiple user simulators simultaneously to optimize ToD systems, leading to a framework called MUST.
Abstract: User simulators are agents designed to imitate human users; recent advances have found that Task-oriented Dialogue (ToD) systems optimized toward a user simulator could better satisfy the need of human users. However, this might result in a sub-optimal ToD system if it is tailored to only one \textit{ad hoc} user simulator, since human users can behave differently. In this paper, we propose a framework called MUST to optimize ToD systems via leveraging \textbf{m}ultiple \textbf{u}ser \textbf{s}imula\textbf{t}ors. The main challenges of MUST fall in 1) how to adaptively specify which user simulator to interact with the ToD system at each optimization step, since the ToD system might be over-fitted to some specific user simulators, and simultaneously under-fitted to some others; 2) how to avoid catastrophic forgetting of the adaption for a simulator that is not selected for several consecutive optimization steps. To tackle these challenges, we formulate MUST as a Multi-armed bandits (MAB) problem and provide a method called MUST$_{\mathrm{adaptive}}$ that balances \textit{i}) the \textit{boosting adaption} for adaptive interactions between different user simulators and the ToD system and \textit{ii}) the \textit{uniform adaption} to avoid the catastrophic forgetting issue. With both automatic evaluations and human evaluations, our extensive experimental results on the restaurant search task from MultiWOZ show that the dialogue system trained by our proposed MUST achieves a better performance than those trained by any single user simulator. It also has a better generalization ability when testing with unseen user simulators. Moreover, our method MUST$_{\mathrm{adaptive}}$ is indeed more efficient and effective to leverage multiple user simulators by our visualization analysis.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
14 Replies

Loading