Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement LearningDownload PDF

Harm H van Seijen, Mehdi Fatemi, Arash Tavakoli

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that this is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis, which states that the size-difference of the action-gap across the state-space is the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.
Code Link:
CMT Num: 7875
1 Reply