Linearly Controlled Language Generation with Performative Guarantees

Published: 09 Oct 2024, Last Modified: 15 Dec 2024MINT@NeurIPS2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: optimal control, large language models
Abstract: With increased use of Large Language Models (LMs) comes a need for controlled text generation strategies with performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. We take the view that each natural language token generation traces a trajectory in this continuous space, realized by the LM's hidden layer activations. This view permits a control-theoretic treatment of text generation in latent space, where we propose a lightweight, gradient-free intervention that is guaranteed (in-probability) to steer trajectories away from regions corresponding to undesired meanings. We demonstrate on toxicity and negativity use cases that the intervention steers language away from undesired content while maintaining text quality.
Email Of Author Nominated As Reviewer: emilyshana.cheng@upf.edu
Submission Number: 26
Loading