Distributed Algorithm for Multi-objective Multi-agent Reinforcement Learning

Distributed Algorithm for Multi-objective Multi-agent Reinforcement Learning

ICLR 2026 Conference Submission16071 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-objective multi-agent systems, fully distributed reinforcement learning, pareto-stationary convergence.

Abstract: Multi-objective reinforcement learning (MORL) aims to optimize multiple conflicting objectives for a single agent, where finding Pareto-optimal solutions is NP-hard and existing algorithms are often centralized with high computational complexity, limiting their practical applicability. Multi-objective multi-agent reinforcement learning (MOMARL) extends MORL to multiple agents, which not only increases computational complexity exponentially due to the global state-action space, but also introduces communication challenges, as agents cannot continuously communicate with a central coordinator in large-scale scenarios. This necessitates distributed algorithm, where each agent relies only on the information of its neighbors within a limited range rather than depending on the global scale. To address these challenges, we propose a distributed MOMARL algorithm in which each agent leverages only the state of its $\kappa$-hop neighbors and locally adjusts the weights of multiple objectives through a consensus protocol. We introduce an approximated policy gradient that reduces the dependency on global actions and a linear function approximation that limits the state space to local neighborhoods. Each agent $i$'s computational complexity is thus reduced from $\mathcal{O}(|\mathbf{\mathcal{S}}||\mathbf{\mathcal{A}}|)$ with global state-action space in centralized algorithms to $\mathcal{O}(|\mathcal{S}\_{\mathcal{N}^{\kappa}\_{i}}||\mathcal{A}\_{i}|)$ with $\kappa$-neighborhood state and local action space. We prove that the algorithm converges to a Pareto-stationary solution at a rate of $\mathcal{O}(1/T)$ and demonstrate in simulations for robot path planning that our approach achieves higher multi-objective values than state-of-the-art method.

Supplementary Material: pdf

Primary Area: reinforcement learning

Submission Number: 16071

Loading