LLE-MORL: Locally Linear Extrapolation of Policies for Efficient Multi-Objective Reinforcement Learning

ICLR 2026 Conference Submission21216 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-objective Optimization, Reinforcement Learning, Interpretability, Pareto Front
Abstract: Multi-objective reinforcement learning (MORL) aims at optimising several, often conflicting goals in order to improve the flexibility and reliability of RL in practical tasks. This can be achieved by finding diverse policies that are optimal for some objective preferences and non-dominated by optimal policies for other preferences so that they form a Pareto front in the multi-objective performance space. The relation between the multi-objective performance space and the parameter space that represents the policies is generally non-unique, and we provide new insights into this by formalising a local parameter-performance relationship. Using a training scheme based on the local parameter-performance relationship, we propose LLE-MORL, a method that directly extrapolates a small set of base policies to efficiently trace out a high-quality Pareto front. Experiments conducted with and without retraining across different domains show that LLE-MORL consistently achieves higher Pareto front quality and efficiency than state-of-the-art approaches.
Primary Area: reinforcement learning
Submission Number: 21216
Loading