Controlling Pretrained Language Generation Models by Learning to Focus

Anonymous

Controlling Pretrained Language Generation Models by Learning to Focus

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Transformer-based language models, which are pretrained on large-scale unsupervised data and then finetuned on task-specific datasets, have become the dominant paradigm for various natural language generation tasks. The finetuning and usages of such models are typically conducted in an end-to-end manner. This work attempts to develop a control mechanism by which a user can select spans of context as "highlights'' for the model to focus on, while generating output text. To achieve this goal, we augment a pretrained model with trainable "attention vectors'' that are directly applied to the model's embeddings, while the model itself is kept fixed. These vectors, trained on automatic annotations derived from attribution methods, act as indicators for context importance. We test our approach on two core generation tasks: dialogue response generation and abstractive summarization. We also collect evaluation data where the highlight-generation pairs are annotated by humans. Our experiments show that the trained attention vectors are effective in steering the model to generate outputs that are relevant to user-selected highlights.

0 Replies

Loading