Controlled Decoding from Language Models

Sidharth Mudgal; Jong Lee; Harish Ganapathy; YaGuang Li; Tao Wang; Yanping Huang; Zhifeng Chen; Heng-Tze Cheng; Michael Collins; Jilin Chen; Alex Beutel; Ahmad Beirami

Controlled Decoding from Language Models

Sidharth Mudgal, Jong Lee, Harish Ganapathy, YaGuang Li, Tao Wang, Yanping Huang, Zhifeng Chen, Heng-Tze Cheng, Michael Collins, Jilin Chen, Alex Beutel, Ahmad Beirami

Published: 23 Oct 2023, Last Modified: 28 Nov 2023SoLaR SpotlightEveryoneRevisionsBibTeX

Keywords: Controlled decoding, language model alignment, reinforcement learning, off-policy q-learning

TL;DR: We propose a new alignment technique, which is an off-policy variant of reinforcement learning and outperforms the widely used KL-regularized PPO.

Abstract: We propose controlled decoding (CD), a novel off-policy reinforcement learning method to control the autoregressive generation from language models towards high reward outcomes. CD solves an off-policy reinforcement learning problem through a value function for the reward, which we call a prefix scorer. The prefix scorer is used at inference time to steer the generation towards higher reward outcomes. We show that the prefix scorer may be trained on (possibly) off-policy data to predict the expected reward when decoding is continued from a partially decoded response. We empirically demonstrate that CD is effective as a control mechanism on Reddit conversations corpus. We also show that the modularity of the design of CD makes it possible to control for multiple rewards effectively solving a multi-objective reinforcement learning problem with no additional complexity. Finally, we show that CD can be applied in a novel blockwise fashion at inference-time, again without the need for any training-time changes, essentially bridging the gap between the popular sequence-level best-of-k strategy and token-level reinforcement learning. This makes CD a promising approach for alignment of language models.

Submission Number: 25

Loading