Direct Advantage Estimation in Partially Observable Environments

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0
Keywords: POMDP, advantage function, deep RL, off-policy learning
Abstract: Direct Advantage Estimation (DAE) was recently shown to improve sample-efficiency of deep reinforcement learning (deep RL) algorithms; however, DAE assumes full observability of the environment, which may be restrictive in realistic settings. In the present work, we first show that DAE can be extended to partially observable domains with minor modifications. Secondly, we address the increased computational cost due to the need to approximate the transition probabilities through the use of discrete latent space models. Finally, we empirically evaluate the proposed method using the Arcade Learning Environments, and show that it is scalable and sample-efficient.
Submission Number: 54
Loading