- Abstract: Large pretrained language representation models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that finetune a pretrained model. However, such transfer learning approaches have not seen the same success for natural language generation. In this work, we explore transfer learning for conditional generation with large pretrained language models. We propose a simple modification to a pretrained unconditional transformer model to inject arbitrary conditioning into the self attention layer, an approach we call pseudo self attention. Through experiments on four long-form conditional text generation tasks, we show that this technique outperforms strong baselines and other transfer learning approaches, and produces coherent generations.
- Keywords: transfer learning, NLP, BERT, GPT