Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture
Abstract: The large pre-training language model GPT-2 has been fine-tuned in task-oriented dialog system and achieved state-of-the-art performance on many datasets. However, there's few work of reinforcement learning on these GPT-2 based dialog systems, not to mention designing a GPT-2 based user simulator. In this paper, we propose a dialog system and user simulator based on GPT-2 with simplified generative architecture for reinforcement learning. The experiments are conducted on MultiWOZ2.1 and we evaluate our system with an offline method and online method respectively. The results show that our dialog system achieves the best performance among all the GPT-2 based models even without RL optimization and the performance of the model is further improved after RL. We also explore different reward settings in RL and provide deep analysis of how the model attends to different information and how RL improve the performance of dialog system.
Paper Type: long
0 Replies
Loading