Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

Willem Röpke; Mathieu Reymond; Patrick Mannion; Diederik M Roijers; Ann Nowe; Roxana Rădulescu

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M Roijers, Ann Nowe, Roxana Rădulescu

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning, Multi-objective, Pareto front

TL;DR: We propose a multi-objective reinforcement learning algorithm that provably learns a Pareto front using a divide and conquer approach.

Abstract: A notable challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies to attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), which decomposes finding the Pareto front into a sequence of constrained single-objective problems. This enables us to guarantee convergence while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. Empirical evaluations demonstrate that IPRO matches or outperforms methods that require additional assumptions. Furthermore, IPRO is a general-purpose multi-objective optimisation method, making it applicable to domains beyond reinforcement learning.

Submission Number: 29

Loading