Accelerate Training of Reinforcement Learning Agent by Utilization of Current and Previous Experience
Abstract: In this paper, we examine three extensions to the Q-function Targets via Optimization (QT-Opt) algorithm andempirically studies their effects on training time over complex robotic tasks. The vanilla QT-Opt algorithmrequires lots of offline data (several months with multiple robots) for training which is hard to collect inpractice. To bridge the gap between basic reinforcement learning research and real world robotic applications,first we propose to use hindsight goals techniques (Hindsight Experience Replay, Hindsight Goal Generation)and Energy-Based Prioritization (EBP) to increase data efficiency in reinforcement learning. Then, an efficientoffline data collection method using PD control method and dynamic buffer are proposed. Our experimentsshow that both data collection and training the agent for a robotic grasping task takes about one day only,besides, the learning performance maintains high level (80% successful rate). This work serves as a steptowards accelerating the training of reinforcement learning for complex real world robotics tasks.
Loading