Prioritizing States with Action Sensitive Return in Experience ReplayDownload PDF

Published: 20 Jul 2023, Last Modified: 31 Aug 2023EWRL16Readers: Everyone
Keywords: Experience Replay, Replay Prioritization, Function Approximators, n-Step Returns, Action Sensitivity, Objective Mismatch
TL;DR: Prioritizing replay of states where return is more sensitive to the chosen action allows for more sample efficient and stable learning, especially with smaller parametric function approximators.
Abstract: Experience replay for off-policy reinforcement learning has been shown to improve sample efficiency and stabilize training. However, typical uniformly sampled replay includes many irrelevant samples for the agent to reach good performance. We introduce Action Sensitive Experience Replay (ASER), a method to prioritize samples in the replay buffer and selectively model parts of the state-space more accurately where choosing sub-optimal actions has a larger effect on the return. We experimentally show that this can make training more sample efficient and that this allows smaller parametric function approximators -- like neural networks with few neurons -- to achieve good performance in environments where they would otherwise struggle.
1 Reply

Loading