Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

Tianxing He; Jun Liu; Kyunghyun Cho; Myle Ott; Bing Liu; James Glass; Fuchun Peng

Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

TL;DR: We identify the forgetting problem in fine-tuning of pre-trained NLG models, and propose the mix-review strategy to address it.

Abstract: In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named "mix-review''. We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.

Keywords: language generation, forgetting, pretraining, open-domain dialogue

Original Pdf: pdf

7 Replies

Loading