Utilising the Parameter-Performance Relationship for Efficient Multi-Objective Reinforcement Learning

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-objective Optimization, Reinforcement Learning, Pareto Front
Abstract: Multi-objective reinforcement learning (MORL) aims to identify diverse optimal policies forming a Pareto front to balance different, often conflicting objectives. The complex mapping between the policy parameter space and the multi-objective performance space poses significant challenges for efficient exploration. This work formally introduces and exploits the Parameter-Performance Relationship (PPR), proposing that an understanding of its local structure enables more efficient MORL. We present an algorithm that realises the PPR through locally linear extensions, called LLE-MORL. By using a few initial policies and their briefly retrained variants to define extension directions, our method efficiently generates candidate policies along the Pareto front with minimal additional training. Experiments on continuous control benchmarks show our approach discovers high-quality, comprehensive Pareto fronts efficiently than existing methods. This demonstrates that systematically leveraging the PPR provides a powerful strategy for advancing MORL.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~J._Michael_Herrmann1
Track: Regular Track: unpublished work
Submission Number: 168
Loading