Keywords: Multi-party Multi-objective Reinforcement Learning; Constrained Reinforcement Learning; Multi-objective Reinforcement Learning
TL;DR: The paper proposes a multi-party negotiation framework for safe multi-objective reinforcement learning, allowing for a Pareto front of policies that balance efficiency and safety constraints.
Abstract: Safe multi-objective reinforcement learning (Safe MORL) seeks to optimize performance while satisfying safety constraints. Existing methods face two key challenges: (i) incorporating safety as additional objectives enlarges the objective space, requiring more solutions to uniformly cover the Pareto front and maintain adaptability under changing preferences; (ii) strictly enforcing safety constraints is feasible for single or compatible constraints, but conflicting constraints prevent flexible, preference-aware trade-offs.
To address these challenges, we cast Safe MORL within a multi-party negotiation framework that treats safety as an external regulatory perspective, enabling the search for a consensus-based multi-party Pareto-optimal set. We propose a multi-party Pareto negotiation (MPPN) strategy built on NSGA-II, which employs a negotiation threshold $\varepsilon$ to represent the acceptable solution range for each party. During evolutionary search, $\varepsilon$ is dynamically adjusted to maintain a sufficiently large negotiated solution set, progressively steering the population toward the $(\varepsilon_{\text{efficiency}}, \varepsilon_{\text{safety}})$-negotiated common Pareto set.
The framework preserves user preferences over conflicting safety constraints without introducing additional objectives and flexibly adapts to emergent scenarios through progressively guided $(\varepsilon_{\text{efficiency}}, \varepsilon_{\text{safety}})$. Experiments on a MuJoCo benchmark show that our approach outperforms state-of-the-art methods in both constrained and unconstrained MORL, as measured by multi-party hypervolume and sparsity metrics, while supporting preference-aware policy selection across stakeholders.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 15968
Loading