Keywords: Multi-objective optimization, reinforcement learning, pareto-front
Abstract: Multi-objective reinforcement learning (MORL) plays a pivotal role in addressing multi-criteria decision-making problems in the real world. The multi-policy
(MP)-based approaches are widely used to obtain high-quality Pareto front approximations for the MORL problems. Relying primarily on the online reinforcement learning (RL), the traditional MP approaches usually adopt the evolutionary
framework that requires maintaining a large policy population. In practice, however, this often leads to sample inefficiency and/or excessive agent-environment
interactions. To address these issues, we propose the novel Multi-policy Pareto
Front Tracking (MPFT) framework that eliminates the need to maintain any policy population, compatible with both online and offline MORL algorithms. The
proposed MPFT framework comprises four stages: Stage 1 approximates all the
Pareto-vertex policies whose mappings to the objective space lie on the vertices
of the Pareto front; Stage 2 proposes a new Pareto tracking mechanism that starts
from each Pareto-vertex policy to track the Pareto front, where a proof of its exponential convergence is provided; Stage 3 identifies the sparse regions in the tracked
Pareto front, and then newly designs an objective weight adjustment method to facilitate the policy tracking for filling these regions; Finally, by combining all the
policies tracked in Stages 2 and 3, Stage 4 approximates the complete Pareto front.
Experiments are conducted on seven continuous-action robotic control tasks using
both online and offline MORL algorithms. Results demonstrate that our proposed
MPFT approach outperforms state-of-the-art benchmarks in terms of hypervolume and expected utility performances, while significantly reducing the agentenvironment interactions.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 5722
Loading