PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Toygun Basaklar; Suat Gumussoy; Umit Ogras

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Toygun Basaklar, Suat Gumussoy, Umit Ogras

Published: 01 Feb 2023, Last Modified: 02 Mar 2023ICLR 2023 posterReaders: Everyone

Keywords: multi-objective reinforcement learning, MORL, DDQN, TD3, HER, continuous control, robotics application

TL;DR: A novel approach that obtains a single policy network optimizing multiple objectives using multi-objective reinforcement learning on challenging continuous control tasks.

Abstract: Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. It also employs a novel parallelization approach to increase sample efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

18 Replies

Loading