Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample ComplexityDownload PDF

Published: 31 Oct 2022, 18:00, Last Modified: 15 Oct 2022, 23:59NeurIPS 2022 AcceptReaders: Everyone
Keywords: Reward Shaping, Regret Analysis
TL;DR: We provide regret analysis of reward shaping
Abstract: The success of reinforcement learning in a variety of challenging sequential decision-making problems has been much discussed, but often ignored in this discussion is the consideration of how the choice of reward function affects the behavior of these algorithms. Most practical RL algorithms require copious amounts of reward engineering in order to successfully solve challenging tasks. The idea of this type of ``reward-shaping'' has been often discussed in the literature and is used in practical instantiations, but there is relatively little formal characterization of how the choice of reward shaping can yield benefits in sample complexity for RL problems. In this work, we build on the framework of novelty-based exploration to provide a simple scheme for incorporating shaped rewards into RL along with an analysis tool to show that particular choices of reward shaping provably improve sample efficiency. We characterize the class of problems where these gains are expected to be significant and show how this can be connected to practical algorithms in the literature. We show that these results hold in practice in experimental evaluations as well, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance.
Supplementary Material: pdf
20 Replies

Loading