Multi-Policy Pareto Front Tracking Based Multi-Objective Reinforcement Learning

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-objective optimization, reinforcement learning, pareto-front
Abstract: Multi-objective reinforcement learning (MORL) plays a pivotal role in addressing multi-criteria decision-making problems in the real world. The multi-policy (MP)-based approaches are widely used to obtain high-quality Pareto front approximations for the MORL problems. Relying primarily on the online reinforcement learning (RL), the traditional MP approaches usually adopt the evolutionary framework that requires maintaining a large policy population. In practice, however, this often leads to sample inefficiency and/or excessive agent-environment interactions. To address these issues, we propose the novel Multi-policy Pareto Front Tracking (MPFT) framework that eliminates the need to maintain any policy population, compatible with both online and offline MORL algorithms. The proposed MPFT framework comprises four stages: Stage 1 approximates all the Pareto-vertex policies whose mappings to the objective space lie on the vertices of the Pareto front; Stage 2 proposes a new Pareto tracking mechanism that starts from each Pareto-vertex policy to track the Pareto front, where a proof of its exponential convergence is provided; Stage 3 identifies the sparse regions in the tracked Pareto front, and then newly designs an objective weight adjustment method to facilitate the policy tracking for filling these regions; Finally, by combining all the policies tracked in Stages 2 and 3, Stage 4 approximates the complete Pareto front. Experiments are conducted on seven continuous-action robotic control tasks using both online and offline MORL algorithms. Results demonstrate that our proposed MPFT approach outperforms state-of-the-art benchmarks in terms of hypervolume and expected utility performances, while significantly reducing the agentenvironment interactions.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 5722
Loading