A Novel Reinforcement Learning Sampling Method Without Additional Environment Feedback in Hindsight Experience Replay

Jan R. Seyler, Chenxing Li, Yinlong Liu, Zhenshan Bing, Shahram Eivazi

Published: 16 Dec 2021, Last Modified: 12 Sept 2024RiTA 2021EveryoneCC BY-NC 4.0

Abstract: Hindsight Experience Replay (HER) in reinforcement learning is used to train an agent by substituting the real goal with hindsight goals (virtual goals). This technique improves the data efficiency and speeds up the learning process. To efficiently choose a hindsight goal, previous research suggested an Energy-Based Prioritization (EBP) method. However, for complex robotic tasks which RL agent interacts with objects in the environment, the objects’ information such as location and velocity are needed in EBP. This is not feasible for real world application. In this paper, we propose a Trajectory Behaviour Prioritization (TBP) method to remove the need for an additional environment feedback while maintaining a competitive learning performance. We define a trajectory behaviour weight function to consider good behaviours in one trajectory. We evaluate our TBP approach on two challenging robotic manipulation tasks in simulation, The results show that our approaches preform well deposit of having no information related to objects. This work serves as a step towards accelerating the training of reinforcement learning for complex real world robotics tasks.