Keywords: Deep Reinforcement Learning, Experience Replay, Sample Efficiency, Continuous Control
TL;DR: We interpolate neighboring samples in experience replay buffers to improve sample efficiency of deep reinforcement learning agents performing continuous control tasks.
Abstract: Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose the use of Mixup  to further improve sample efficiency via synthetic sample generation. We build upon this idea with Neighborhood Mixup Experience Replay (NMER), a modular replay buffer that interpolates transitions with their closest neighbors in normalized state-action space. NMER preserves a locally linear approximation of the transition manifold by only performing Mixup between transitions with similar state-action features. Under NMER, a given transition’s set of state-action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via cross-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on several continuous control environments. We observe that NMER improves sample efficiency by an average 87% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.
Supplementary Material: zip