Multi-Objective Utility Actor Critic with Utility Critic for Nonlinear Utility Function

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: MORL, eligibility trace, actor critic, policy gradient
TL;DR: An on policy Multi Objective Actor-Critic algorithm for ESR setting given nonlinear utility function
Abstract: n multi-objective reinforcement learning (MORL), non-linear utility functions pose a significant challenge, as the two optimization criteria—scalarized expected return (SER) and expected scalarized return (ESR)—can diverge substantially. Applying single-objective reinforcement learning methods to solve ESR problems often introduces bias, particularly in the presence of non-linear utilities. Moreover, existing MORL policy-based algorithms, such as EUPG and MOCAC, suffer from numerous hyperparameters, large search spaces, high variance, and low learning efficiency, which frequently result in sub-optimal policies. In this paper, we propose a new multi-objective policy search algorithm called Multi-Objective Utility Actor-Critic (MOUAC). For the first time in the field, MOUAC introduces a Utility Critic based on expected state utility to replace Q- value critic, value function, or distributional critic based on Q-values or value functions. To address the high variance challenges inherent in multi-objective reinforcement learning (MORL), MOUAC also adapts traditional eligibility trace to the multi-objective setting called MnES-return. Empirically, we demonstrate that our algorithm achieves state-of-the-art (SOTA) performance in on-policy multi- objective policy search.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Gao_Peng5
Track: Regular Track: unpublished work
Submission Number: 70
Loading