Keywords: Experience Replay More, Key Transitions, Sampling, Add Noise to Noise, Deep Reinforcement Learning
Abstract: We propose a experience replay mechanism in Deep Reinforcement Learning based on Add Noise to Noise (AN2N), which requires agent to replay more experience containing key state, abbreviated as Experience Replay More (ERM). In the AN2N algorithm, we refer to the states where exploring more as the key states. We found that how the transitions containing the key state participates in updating the policy and Q networks has a significant impact on the performance improvement of the deep reinforcement learning agent, and the problem of catastrophic forgetting in neural networks is further magnified in the AN2N algorithm. Therefore, we change the previous strategy of uniform sampling of experience transitions. We sample the transition used for experience replay according to whether the transition contains key states and whether it is the most recently generated, which is the core idea of the ERM algorithm. The experimental results show that this algorithm can significantly improve the performance of the agent. We combine the ERM algorithm with Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3) and Soft Actor-Critic (SAC), and evaluate algorithm on the suite of OpenAI gym tasks, SAC with ERM achieves a new state of the art, and DDPG with ERM can even exceed the average performance of SAC under certain random seeds, which is incredible.
One-sentence Summary: We propose a experience replay mechanism in Deep Reinforcement Learning based on Add Noise to Noise (AN2N), which requires agent to replay more key transitions, abbreviated as Experience Replay More (ERM).
18 Replies
Loading