Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models

Qingyang Wu; Yichi Zhang; Yu Li; Zhou Yu

Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models

Qingyang Wu, Yichi Zhang, Yu Li, Zhou Yu

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: NLP, Pre-training, GPT-2, Text Generation, Dialog Generation

TL;DR: We propose a simple, general, and effective framework with the large pre-trained language model GPT-2.

Abstract: Existing dialog system models require extensive human annotations and are difficult to generalize to different tasks. The recent success of large pre-trained language models such as BERT and GPT-2 have suggested the effectiveness of incorporating language priors in down-stream NLP tasks. However, how much pre-trained language models can help dialog response generation is still under exploration. In this paper, we propose a simple, general, and effective framework: Alternating Recurrent Dialog Model (ARDM). ARDM models each speaker separately and takes advantage of the large pre-trained language model. It requires no supervision from human annotations such as belief states or dialog acts to achieve effective conversations. ARDM outperforms or is on par with state-of-the-art methods on two popular task-oriented dialog datasets: CamRest676 and MultiWOZ. Moreover, we can generalize ARDM to more challenging, non-collaborative tasks such as persuasion. In persuasion tasks, ARDM is capable of generating human-like responses to persuade people to donate to a charity.

Code: https://anonymous.4open.science/r/99c2260f-b85c-4ed7-9067-3333e7ac14ce/

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1910.03756/code)

Original Pdf: pdf

8 Replies

Loading