Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

20 Sept 2025 (modified: 15 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: deep reinforcement learning, advantage estimation, arcade learning environment
Abstract: Direct Advantage Estimation (DAE) has been shown to improve the sample efficiency of deep reinforcement learning. However, its reliance on full environment observability limits applicability in realistic settings. In the present work, we (i) extend DAE to partially observable domains with minimal modifications, and (ii) reduce its computational overhead by introducing discrete latent dynamics models to approximate transition probabilities efficiently. We evaluate our approach on the Arcade Learning Environment and find that DAE scales with function approximator capacity while maintaining high sample efficiency.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 25036
Loading