LM-Switch: Transforming Word Embedding Space for Flexible Language Model Steering

Chi Han; Jialiang Xu; Manling Li; Yi Fung; Chenkai Sun; Nan Jiang; Tarek Abdelzaher; Heng Ji

LM-Switch: Transforming Word Embedding Space for Flexible Language Model Steering

Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Language Model, Word Embeddings, Representation Interpretation, Model Control

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a theoretically grounded while lightweight method for efficient language model conditioning, and demonstrated its performance, efficiency, inter-model transferability, and ability of continuous and compositional control.

Abstract: Large language models (LLMs) have advanced significantly as general-purpose tools. Varied real-life demands, ranging from risk management for specific audiences to customizing text styles for different scenarios, all necessitate customizing general-purpose LLMs to different conditions. However, existing pre-training or fine-tuning solutions are still not efficient or flexible enough, and can compromise LLMs’ original quality. Applying classifiers as constraints requires an expensive decoding process. We motivate ourselves by theoretically interpreting the role of word embeddings in modeling output distribution. By analyzing a variant of Hidden Markov Models (HMMs), we find that different conditions in HMMs can be surprisingly understood as linear transformations in the output word embedding space. This finding inspires LM-Switch, a novel, theoretically grounded, lightweight, transferrable, and flexible method for generative language model conditioning. LM-Switch simply deploys a linear transformation in the output word embedding space. It can achieve comparable or superior performance compared with state-of-the-art baselines in LM detoxification and sentiment control while maintaining a better balance with generation quality, despite training only 0.2% of model parameters. It is also able to learn from a few sentences or one document. One can continuously steer LLMs by scaling the transformation, or compose multiple conditions by adding their transformations. Moreover, a learned LM-Switch can be transferred to other LLMs of different sizes. We will make our code available to the research community following publication.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5849

Loading