Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Weighted-Chebyshev Multi-Objective Actor-Critic Approach
Keywords: Multi-Objective Reinforcement Learning, Actor-Critic Algorithm
Abstract: In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem.
This motivates us to take the first step and fill the important gap in MORL.
Specifically, in this paper, we propose a weighted-Chebyshev multi-objective actor-critic (\policyns) algorithm for MORL, which uses multi-temporal-difference (TD) learning in the critic step and judiciously integrates the weighted-Chebychev (WC) and multi-gradient descent techniques in the actor step to enable systematic Pareto-stationarity exploration with finite-time sample complexity guarantee.
Our proposed \policy algorithm achieves a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2}p^{-2}\_{\min})$ in finding an $\epsilon$-Pareto-stationary solution, where $p_{\min}$ denotes the minimum entry of a given weight vector $p$ in the WC-scarlarization.
This result not only implies a state-of-the-art sample complexity that is independent of objective number $M$, but also brand-new dependence result in terms of the preference vector $p$.
Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of our \policy algorithm significantly outperforms other baseline MORL approaches.
Supplementary Material: pdf
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12698
Loading