Learning to Shape Rewards using a Game of Two Partners

David Henry Mguni; Jianhong Wang; Taher Jafferjee; Nicolas Perez-Nieves; Wenbin Song; Feifei Tong; Hui Chen; Jiangcheng Zhu; Yaodong Yang; Jun Wang

Learning to Shape Rewards using a Game of Two Partners

David Henry Mguni, Jianhong Wang, Taher Jafferjee, Nicolas Perez-Nieves, Wenbin Song, Feifei Tong, Hui Chen, Jiangcheng Zhu, Yaodong Yang, Jun Wang

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Reinforcement learning, Reward Shaping, Markov game, Sparse rewards

Abstract: Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimal Shaping Algorithm (ROSA), an automated RS framework in which the shaping reward function is constructed in a novel Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards and their optimal values while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which easily adopts existing RL algorithms, learns to construct a shaping reward function that is tailored to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA’s congenial properties in three carefully designed experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

17 Replies

Loading