Abstract: The discounting mechanism in Reinforcement Learning determines the relative importance of future and present rewards.
While exponential discounting is widely used in practice, non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents.
However, non-exponential discounting methods cannot be directly applied in modern on-policy actor-critic algorithms like PPO.
To address this issue, we propose Universal Generalized Advantage Estimation (UGAE), which allows for the computation of GAE advantages with arbitrary discounting.
Additionally, we introduce Beta-weighted discounting, a continuous interpolation between exponential and hyperbolic discounting, to increase flexibility in choosing a discounting method.
To showcase the utility of UGAE, we provide an analysis of the properties of various discounting methods.
We also show experimentally that agents with non-exponential discounting trained via UGAE outperform variants trained with Monte Carlo advantage estimation.
Through analysis of various discounting methods and experiments, we demonstrate the superior performance of UGAE with Beta-weighted discounting over the Monte Carlo baseline on standard RL benchmarks. UGAE is simple and easily integrated into any advantage-based algorithm as a replacement for the standard recursive GAE.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Matthieu_Geist1
Submission Number: 993
Loading