Solving single-objective tasks by preference multi-objective reinforcement learning

Jinsheng Ren; Shangqi Guo; Feng Chen

Solving single-objective tasks by preference multi-objective reinforcement learning

Jinsheng Ren, Shangqi Guo, Feng Chen

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, single-objective tasks, multi-objectivization

TL;DR: Solving complex single-objective tasks by preference multi-objective reinforcement learning.

Abstract: There ubiquitously exist many single-objective tasks in the real world that are inevitably related to some other objectives and influenced by them. We call such task as the objective-constrained task, which is inherently a multi-objective problem. Due to the conflict among different objectives, a trade-off is needed. A common compromise is to design a scalar reward function through clarifying the relationship among these objectives using the prior knowledge of experts. However, reward engineering is extremely cumbersome. This will result in behaviors that optimize our reward function without actually satisfying our preferences. In this paper, we explicitly cast the objective-constrained task as preference multi-objective reinforcement learning, with the overall goal of finding a Pareto optimal policy. Combined with Trajectory Preference Domination we propose, a weight vector that reflects the agent's preference for each objective can be learned. We analyzed the feasibility of our algorithm in theory, and further proved in experiments its better performance compared to those that design the reward function by experts.

Original Pdf: pdf

6 Replies

Loading