Keywords: MORL, eligibility trace, actor critic, policy gradient
TL;DR: An on policy Multi Objective Actor-Critic algorithm for ESR setting given nonlinear utility function
Abstract: n multi-objective reinforcement learning (MORL), non-linear utility functions
pose a significant challenge, as the two optimization criteria—scalarized expected
return (SER) and expected scalarized return (ESR)—can diverge substantially.
Applying single-objective reinforcement learning methods to solve ESR problems
often introduces bias, particularly in the presence of non-linear utilities. Moreover,
existing MORL policy-based algorithms, such as EUPG and MOCAC, suffer from
numerous hyperparameters, large search spaces, high variance, and low learning
efficiency, which frequently result in sub-optimal policies.
In this paper, we propose a new multi-objective policy search algorithm called
Multi-Objective Utility Actor-Critic (MOUAC). For the first time in the field,
MOUAC introduces a Utility Critic based on expected state utility to replace Q-
value critic, value function, or distributional critic based on Q-values or value
functions. To address the high variance challenges inherent in multi-objective
reinforcement learning (MORL), MOUAC also adapts traditional eligibility trace
to the multi-objective setting called MnES-return. Empirically, we demonstrate
that our algorithm achieves state-of-the-art (SOTA) performance in on-policy multi-
objective policy search.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Gao_Peng5
Track: Regular Track: unpublished work
Submission Number: 70
Loading