Aligning large language models (LLMs) with human preference is critical to enhancing LLMs' safety, helpfulness, helpfulness, humor, faithfulness, etc. The current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptivity and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences on multi-objective trade-offs, such as prioritizing humor and empathy in one context, while seeking efficiency and precision in another. Can we train one LLM to produce personalized outputs for different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains an LLM as a meta-policy to directly generate responses in the preference-defined regions of Pareto front. Our approach integrates multi-objective optimization (MOO) principles into Proximal Policy Optimization (PPO) to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, which enables us to finetune an LLM of 7B parameters on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) Controllability of LLM outputs w.r.t. user preferences on the trade-off among multiple rewards; (ii) Quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) Generalization to unseen preferences. These results highlight MOC’s potential for real-world applications requiring scalable and customizable LLMs.
Keywords: controllable language models, reinforcement learning from human feedback
TL;DR: We introduce Multi-Objective Control to control language models that adaptively generate personalized responses aligned with user-specific preferences on the Pareto front.
Abstract:
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2360
Loading