Abstract: Recent progress in deep reinforcement learning has enabled agents to autonomously learn complex control strategies from scratch. Model-free approaches like Deep Deterministic Policy Gradients (DDPG) seem promising for applications with intricate dynamics, such as contact-rich manipulation tasks. However, these methods typically require large amounts of training data or meticulous hyperparameter tuning, limiting their usefulness for real-world robotics applications. In this paper, we evaluate and benchmark our recently proposed approach for improving model-free reinforcement learning with DDPG through Qgraph-based bounds in temporal difference learning. We directly apply the algorithm to a challenging real-world industrial insertion task and assess its performance (see https://youtu.be/Z_GcNbCWE-E). Empirical results show that the insertion task can be learned despite significant frictional forces and uncertainty, even in sparse-reward settings. We present an in-depth comparison based on a large number of experiments and demonstrate the advantages and performance of Qgraph-bounded DDPG: the learning process can be significantly sped up, robustified against bad choices of hyperparameters and runs with less memory requirements. Lastly, the presented results extend the current theoretical understanding of the link between data graph structure and soft divergence in DDPG.
0 Replies
Loading