Keywords: DQN, Dueling, Value-sharing
TL;DR: We propose a Mean Expansion transformation, a parameter free layer that can be added at the end of a Q-network that increases sample efficiency and boosts performance.
Abstract: Action-values are foundational to many control algorithms such as Q-learning, therefore learning them efficiently is central to reinforcement learning (RL).
However, action-value learning can be slow, requiring many updates to move values from their initialization, typically near zero, to their true values, which may be far from zero.
Moreover, action-value learning algorithms typically update each state–action pair independently, without learning shared value structure across actions within a state.
In this paper, we address these inefficiencies by introducing the mean-expansion layer, which accelerates action-value learning by sharing values across actions within a state and by changing the problem from directly learning potentially large action-values to learning a lower-norm representation of them.
In deep RL, this layer can be applied as a parameter-free addition to Q-network architectures without altering the underlying algorithm.
Empirically, we show that it improves DQN and IQN's performance in aggregate across 57 Atari games while increasing action gaps and dramatically reducing value overestimation.
Journal Edition Interest: No
Submission Number: 34
Loading