Multi-Reward Fusion:  Learning from Other Policies by Distilling

Yiwen Zhu; Yujing Hu; Wenya Wei; Yuan Wang; Zhou Fang

Multi-Reward Fusion: Learning from Other Policies by Distilling

Yiwen Zhu, Yujing Hu, Wenya Wei, Yuan Wang, Zhou Fang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Energy-Based, Policy Distilling, Reinforcement Learning, Auto Reward Shaping

TL;DR: Multi-Reward Fusion: Learn from other policies by distilling

Abstract: Designing rewards is crucial for applying reinforcement learning in practice. However, it is difficult to design a shaping reward which can accelerate agents' learning process without biasing the original task's optimization objective. Moreover, the low-dimensional representation of the reward and value function (i.e. scalar value) may also be an obstruction during the learning process. This paper contributes towards tackling these challenges, by proposing a new method, called Multi-Reward Fusion (MRF). MRF take as input a list of human designed rewards, which contains the information from multiple perspectives about the task, and learns separate policies for each component of the reward list. We formulate the problem of learning the target policy as a distillation task, propose a novel method which can selectively distills knowledge from the auxiliary policies, and theoretically show the feasibility of this method. We conduct extensive experiments and show that the MRF method performs better than state-of-the-art reward shaping methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Supplementary Material: zip

4 Replies

Loading