- Keywords: reinforcement learning, single-objective tasks, multi-objectivization
- TL;DR: Solving complex single-objective tasks by preference multi-objective reinforcement learning.
- Abstract: There ubiquitously exist many single-objective tasks in the real world that are inevitably related to some other objectives and influenced by them. We call such task as the objective-constrained task, which is inherently a multi-objective problem. Due to the conflict among different objectives, a trade-off is needed. A common compromise is to design a scalar reward function through clarifying the relationship among these objectives using the prior knowledge of experts. However, reward engineering is extremely cumbersome. This will result in behaviors that optimize our reward function without actually satisfying our preferences. In this paper, we explicitly cast the objective-constrained task as preference multi-objective reinforcement learning, with the overall goal of finding a Pareto optimal policy. Combined with Trajectory Preference Domination we propose, a weight vector that reflects the agent's preference for each objective can be learned. We analyzed the feasibility of our algorithm in theory, and further proved in experiments its better performance compared to those that design the reward function by experts.