Self-conditioning pre-trained language models

Anonymous

Self-conditioning pre-trained language models

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: We present a method to condition pre-trained Transformer-based Language Models without fine-tuning or using additional parameters. Our approach leverages the presence of existing \emph{expert units} in the model that can be used to steer text generation. We describe how to identify such expert units, and propose an inference time intervention upon them at that allows conditioning. Results show that our method is effective for conditioning, even on fine-grained homograph concepts. Furthermore, we use a large corpus of contexts that highlights the presence of inherited gender bias in the output generated by an unconditioned model. Our experiments show that our method can be used to correct this behaviour and to achieve gender parity for all of the contexts. We compare our method with PPLM-BoW (Dathathri et al., 2020), and show that our approach is able to achieve parity at a much lower perplexity. The proposed method is accessible to a wide audience thanks to its simplicity and minimal compute needs.

0 Replies

Loading