Global Convergence and Pareto Front Exploration in Deep-Neural Actor-Critic Multi-Objective Reinforcement Learning

Global Convergence and Pareto Front Exploration in Deep-Neural Actor-Critic Multi-Objective Reinforcement Learning

ICLR 2026 Conference Submission14511 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-objective reinforcement learning, Deep neural network, Finite-time global convergence.

Abstract: Multi-objective reinforcement learning (MORL) has gained considerable traction in recent years, with applications across diverse domains. However, its theoretical foundations remain underdeveloped, especially for widely used but largely heuristic deep neural network (DNN)-based actor–critic methods. This motivates us to study MORL from a theoretical perspective and to develop DNN-based actor–critic approaches that (i) provide global convergence guarantees to Pareto-optimal policies and (ii) enable systematic exploration of the entire Pareto front (PF). To achieve systematic PF exploration, we first scalarize the original vector-valued MORL problem using the weighted Chebyshev (WC) technique and leveraging the one-to-one correspondence between the PF and WC scalarizations. We then address the non-smoothness introduced by WC in the scalarized problem via a parameterized log-sum-exp softmax approximation, which allows us to design a deep neural actor–critic method for solving the smoothed WC-scalarized MORL problem with a global convergence rate of $\mathcal{O}(1/T)$, where $T$ denotes the total number of iterations. To the best of our knowledge, this is the first work to establish theoretical guarantees for both global convergence and systematic Pareto front exploration in deep neural actor–critic MORL. Finally, extensive numerical experiments and ablation studies on recommendation system training and robotic simulation further validate the effectiveness of our method, especially its capability in Pareto exploration.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 14511

Loading