Abstract: Data augmentation methods have proven highly effective in supervised learning domains where semantic-invariant perturbations can be easily applied to the labeled input examples. However, some used in deep reinforcement learning cannot guarantee the consistent semantics in the transition samples, leading to wrong value estimation and optimization during training. In this paper, we focus on the symmetric consistency in states and actions, and propose the Symmetric DQN to maintain it for better performance and data efficiency on small data regimes. Specifically, we adopt a consistent state flip and action projection on the original interaction transitions to construct the corresponding symmetric ones. The symmetric ones have similar middle rewards and terminal signals as the original ones but reverse states and actions. Therefore, by optimizing both the original and the symmetric losses, Symmetric DQN facilitates the value estimation from two directions per transition. We illustrate the promise of Symmetric DQN by conducting experiments on the Atari 100K benchmark. Symmetric DQN achieves a median HNS of 27.6% and a mean HNS of 51.1%, which surpasses most of the previous deep reinforcement learning methods with data augmentation.
0 Replies
Loading