A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

ICLR 2026 Conference Submission17250 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-objective reinforcement learning, reward-free reinforcement learning

Abstract: Many sequential decision-making tasks involve optimizing multiple conflicting objectives, requiring policies that adapt to different user preferences. Multi-objective reinforcement learning (MORL) typically addresses this by training a single policy conditioned on preference-weighted rewards. In this paper, we explore a novel perspective: leveraging reward-free reinforcement learning (RFRL) for MORL. While RFRL has historically been studied independently of MORL, it learns optimal policies for any possible reward function, making it a natural fit for MORL's challenge of handling unknown user preferences. We propose using RFRL's training objective as an auxiliary task to enhance MORL, enabling more effective knowledge sharing beyond the multi-objective reward function given at training time. To this end, we adapt a state-of-the-art RFRL algorithm to the MORL setting and introduce a preference-guided exploration strategy that focuses learning on relevant part of the environment. Our approach significantly outperforms state-of-the-art MORL methods across diverse MO-Gymnasium tasks, achieving superior performance and data efficiency, especially in settings with limited preference samples. This work is the first to explicitly adapt RFRL for MORL, demonstrating its potential as a scalable and effective solution.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 17250

Loading