Keywords: Reinforcement Learning, Generalization, Hyperbolic Discounting, Procgen
TL;DR: Presents hyperbolic discounting-based advantage estimation for policy gradient optimization to achieve generalization.
Abstract: In reinforcement learning (RL), agents typically discount future rewards using an exponential scheme. However, studies have shown that humans and animals instead exhibit hyperbolic time-preferences and thus discount future rewards hyperbolically. In the quest for RL agents that generalize well to previously unseen scenarios, we study the effects of hyperbolic discounting on generalization tasks and present Hyperbolic Discounting for Generalization in Reinforcement Learning (HDGenRL). We propose a hyperbolic discounting-based advantage estimation method that makes the agent aware of and robust to the underlying uncertainty of survival and episode duration. On the challenging RL generalization benchmark Procgen, our proposed approach achieves up to 200\% performance improvement over the PPO baseline that uses classical exponential discounting. We also incorporate hyperbolic discounting into another generalization-specific approach (APDAC), and the results indicate further improvement in APDAC's generalization ability. This denotes the effectiveness of our approach as a plug-in to any existing methods in aiding generalization.