Multi-objective Multi-agent Reinforcement Learning with Pareto-stationary Convergence

25 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-objective, multi-agent reinforcement learning, Pareto-stationary convergence
Abstract: Multi-objective multi-agent reinforcement learning (MOMARL) problems frequently arise in real world applications (e.g., path planning for swarm robots) or have not been explored well. To find Pareto-optimum is NP-hard, and thus some multi-objective algorithms have emerged recently to provide Pareto-stationary solution centrally, managed by a single agent. Yet, they cannot deal with MOMARL problem, as the dimension of global state-action $(\boldsymbol{s},\boldsymbol{a})$ grows exponentially with the number of spatially distributed agents. To tackle this issue, we design a novel graph-truncated $Q$-function approximation method for each agent $i$, which does not require the global state-action $(\boldsymbol{s},\boldsymbol{a})$ but only the neighborhood state-action $(s\_{\mathcal{N}^{\kappa}\_{i}},a\_{\mathcal{N}^{\kappa}\_{i}})$ of its $\kappa$-hop neighbors. To further reduce the dimension to state-action $(s\_{\mathcal{N}^{\kappa}\_{i}},a\_{i})$ with only local action, we further develop a concept of action-averaged $Q$-function and establish the equivalence between using graph-truncated $Q$-function and action-averaged $Q$-function for policy gradient approximation. Accordingly, we develop a distributed scalable algorithm with linear function approximation and we prove that it successfully converges Pareto-stationary solution at rate $\mathcal{O}(1/T)$ that is inversely proportional to time domain $T$. Finally, we run simulations in a robot path planning environment and show our algorithm converges to greater multi-objective values as compared to the latest MORL algorithm, and performs close to the central optimum with much shorter running time.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4037
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview