Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models

Sep 25, 2019 Withdrawn Submission readers: everyone
  • TL;DR: We identify the forgetting problem in fine-tuning of pre-trained NLG models, and propose the mix-review strategy to address it.
  • Abstract: In this work, we study how the large-scale pretrain-finetune framework changes the behavior of a neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. We find that after standard fine-tuning, the model forgets important language generation skills acquired during large-scale pre-training. We demonstrate the forgetting phenomenon through a detailed behavior analysis from the perspectives of context sensitivity and knowledge transfer. Adopting the concept of data mixing, we propose an intuitive fine-tuning strategy named "mix-review''. We find that mix-review effectively regularize the fine-tuning process, and the forgetting problem is largely alleviated. Finally, we discuss interesting behavior of the resulting dialogue model and its implications.
  • Keywords: language generation, forgetting, pretraining, open-domain dialogue
  • Original Pdf:  pdf
0 Replies