Pathologies and Challenges of Using Differentiable Simulators in Policy Optimization for Contact-Rich Manipulation

H.J. Terry Suh; Max Simchowitz; Kaiqing Zhang; Tao Pang; Russ Tedrake

Pathologies and Challenges of Using Differentiable Simulators in Policy Optimization for Contact-Rich Manipulation

H.J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Tao Pang, Russ Tedrake

Published: 14 May 2022, Last Modified: 17 May 2023ICRA 2022 Workshop: RL for Manipulation OralReaders: Everyone

Keywords: Policy Optimization, Differentiable Simulation, Policy Gradients, Contact-rich manipulation

TL;DR: Differentiable Simulators do not always give better policy gradients for contact-rich manipulation.

Abstract: Policy search methods in Reinforcement Learning (RL) have shown impressive results in contact-rich tasks such as dexterous manipulation. However, the high variance of zero-order Monte-Carlo gradient estimates results in slow convergence and a requirement for a high number of samples. By replacing these zero-order gradient estimates with first-order ones, differentiable simulators promise faster computation time for policy gradient methods when the model is known. Contrary to this belief, we highlight some of the pathologies of using first-order gradients and show that in many physical scenarios involving rich contact, using zero-order gradients result in better performance. Building on these pathologies and lessons, we propose guidelines for designing differentiable simulators, as well as policy optimization algorithms that use these simulators. By doing so, we hope to reap the benefits of first-order gradients while avoiding the potential pitfalls.

2 Replies

Loading