Accelerate Training of Reinforcement Learning Agent by Utilization of Current and Previous Experience

Jan R. Seyler, Chenxing Li, Yinlong Liu, Zhenshan Bing, Shahram Eivazi, Fabian Schreier

Published: 22 Feb 2022, Last Modified: 12 Sept 2024ICAART 2023EveryoneCC BY-NC 4.0

Abstract: In this paper, we examine three extensions to the Q-function Targets via Optimization (QT-Opt) algorithm andempirically studies their effects on training time over complex robotic tasks. The vanilla QT-Opt algorithmrequires lots of ofﬂine data (several months with multiple robots) for training which is hard to collect inpractice. To bridge the gap between basic reinforcement learning research and real world robotic applications,ﬁrst we propose to use hindsight goals techniques (Hindsight Experience Replay, Hindsight Goal Generation)and Energy-Based Prioritization (EBP) to increase data efﬁciency in reinforcement learning. Then, an efﬁcientofﬂine data collection method using PD control method and dynamic buffer are proposed. Our experimentsshow that both data collection and training the agent for a robotic grasping task takes about one day only,besides, the learning performance maintains high level (80% successful rate). This work serves as a steptowards accelerating the training of reinforcement learning for complex real world robotics tasks.