Linearly Controlled Language Generation with Performative Guarantees

Emily Cheng; Marco Baroni; Carmen Amo Alonso

Linearly Controlled Language Generation with Performative Guarantees

Emily Cheng, Marco Baroni, Carmen Amo Alonso

Published: 09 Oct 2024, Last Modified: 15 Dec 2024MINT@NeurIPS2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: optimal control, large language models

Abstract: With increased use of Large Language Models (LMs) comes a need for controlled text generation strategies with performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. We take the view that each natural language token generation traces a trajectory in this continuous space, realized by the LM's hidden layer activations. This view permits a control-theoretic treatment of text generation in latent space, where we propose a lightweight, gradient-free intervention that is guaranteed (in-probability) to steer trajectories away from regions corresponding to undesired meanings. We demonstrate on toxicity and negativity use cases that the intervention steers language away from undesired content while maintaining text quality.

Email Of Author Nominated As Reviewer: emilyshana.cheng@upf.edu

Submission Number: 26

Loading