Preference Controllable Reinforcement Learning with Advanced Multi-Objective Optimization

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: Novel Multi-Objective Reinforcement Learning (MORL) method that discovers more Pareto optimal solutions than most previous MORL
Abstract: Practical reinforcement learning (RL) usually requires agents to be optimized for multiple potentially conflicting criteria, e.g. speed vs. safety. Although Multi-Objective RL (MORL) algorithms have been studied in previous works, their trained agents often cover limited Pareto optimal solutions and they lack precise controllability of the delicate trade-off among multiple objectives. Hence, the resulting agent is not versatile in aligning with customized requests from different users. To bridge the gap, we develop the ``Preference controllable (PC) RL'' framework, which trains a preference-conditioned meta-policy that takes user preference as input controlling the generated trajectories within the preference region on the Pareto frontier. The PCRL framework is compatible with advanced Multi-Objective Optimization~(MOO) algorithms that are rarely seen in previous MORL approaches. We also proposed a novel preference-regularized MOO algorithm specifically for PCRL. We provide a comprehensive theoretical analysis to justify its convergence and preference controllability. We evaluate PCRL with different MOO algorithms against state-of-the-art MORL baselines in various challenging environments with up to six objectives. In these experiments, our proposed method exhibits significantly better controllability than existing approaches and can generate Pareto solutions with better diversity and utilities.
Lay Summary: Previous mainstream Multi-Objective Reinforcement Learning (MORL) methods have overlooked recent advances in Multi-Objective Optimization (MOO). They typically rely on basic objectives such as linear scalarization, which can only discover limited optimal solutions and fails to guarantee a desired trade-off among multiple objectives. These methods also struggle with issues like conflicting gradients that result from conflicting objectives. In this work, we incorporate recent developments from the MOO literature to design a general MORL framework that can leverage advanced MOO algorithms. Furthermore, we introduce a novel MOO algorithm tailored for this framework, supported by both theoretical analysis and empirical results that demonstrate its improvements over existing MORL approaches. Our method’s memory-efficient design also makes it practical for use with larger models.
Primary Area: Reinforcement Learning->Everything Else
Keywords: Multi-Objective Optimization, Multi-Objective Reinforcement Learning
Submission Number: 13270
Loading