Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning

Published: 10 Oct 2024, Last Modified: 15 Nov 2024Pluralistic-Alignment 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Multi-Objective Finetuning, Multi-task Learning, Parameter Efficient Training
TL;DR: We present CLP, a framework that enables efficiently trading-off conflicting objectives (e.g. creativity vs. factuality) at inference time by appealing to multi-task learning and parameter efficient finetuning.
Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditional Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Building on techniques from multi-task training and parameter-efficient finetuning, CLP learn steerable models that effectively trade-off conflicting objectives at inference time. Notably, this does not require training or maintaining multiple models to achieve different trade-offs between the objectives. Through extensive experiments and ablations on two summarization datasets, we show that CLP learns steerable language models that outperform and Pareto-dominate the existing approaches for multi-objective finetuning.
Submission Number: 9
Loading