Keywords: Multi-objective multi-agent systems, fully distributed reinforcement learning, pareto-stationary convergence.
Abstract: Multi-objective reinforcement learning (MORL) aims to optimize multiple conflicting objectives for a single agent, where finding Pareto-optimal solutions is NP-hard and existing algorithms are often centralized with high computational complexity, limiting their practical applicability.
Multi-objective multi-agent reinforcement learning (MOMARL) extends MORL to multiple agents, which not only increases computational complexity exponentially due to the global state-action space, but also introduces communication challenges, as agents cannot continuously communicate with a central coordinator in large-scale scenarios.
This necessitates distributed algorithm, where each agent relies only on the information of its neighbors within a limited range rather than depending on the global scale.
To address these challenges, we propose a distributed MOMARL algorithm in which each agent leverages only the state of its $\kappa$-hop neighbors and locally adjusts the weights of multiple objectives through a consensus protocol.
We introduce an approximated policy gradient that reduces the dependency on global actions and a linear function approximation that limits the state space to local neighborhoods.
Each agent $i$'s computational complexity is thus reduced from $\mathcal{O}(|\mathbf{\mathcal{S}}||\mathbf{\mathcal{A}}|)$ with global state-action space in centralized algorithms to $\mathcal{O}(|\mathcal{S}\_{\mathcal{N}^{\kappa}\_{i}}||\mathcal{A}\_{i}|)$ with $\kappa$-neighborhood state and local action space.
We prove that the algorithm converges to a Pareto-stationary solution at a rate of $\mathcal{O}(1/T)$ and demonstrate in simulations for robot path planning that our approach achieves higher multi-objective values than state-of-the-art method.
Supplementary Material: pdf
Primary Area: reinforcement learning
Submission Number: 16071
Loading