Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control TasksDownload PDF

12 Oct 2021, 19:37 (modified: 07 Dec 2021, 23:40)Deep RL Workshop NeurIPS 2021Readers: Everyone
Keywords: Deep Reinforcement Learning, Experience Replay, Sample Efficiency, Continuous Control
TL;DR: We interpolate neighboring samples in experience replay buffers to improve sample efficiency of deep reinforcement learning agents performing continuous control tasks.
Abstract: Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose the use of Mixup [35] to further improve sample efficiency via synthetic sample generation. We build upon this idea with Neighborhood Mixup Experience Replay (NMER), a modular replay buffer that interpolates transitions with their closest neighbors in normalized state-action space. NMER preserves a locally linear approximation of the transition manifold by only performing Mixup between transitions with similar state-action features. Under NMER, a given transition’s set of state-action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via cross-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on several continuous control environments. We observe that NMER improves sample efficiency by an average 87% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.
Supplementary Material: zip
0 Replies