Accelerated Deep Reinforcement Learning of Terrain-Adaptive Locomotion Skills

Khaled S. Refaat; Kai Ding

Accelerated Deep Reinforcement Learning of Terrain-Adaptive Locomotion Skills

Khaled S. Refaat, Kai Ding

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: Reinforcement Learning, Dynamic Terrains, Legged Locomotion Skills, Simulated Characters, Accelerated Learning, Deep Reinforcement Learning

TL;DR: We demonstrate how to accelerate deep reinforcement learning for acquiring locomotion skills on highly dynamic and challenging terrains.

Abstract: Learning locomotion skills on dynamic terrains allows creating realistic animations without recording motion capture data. The simulated character is trained to navigate varying terrains avoiding obstacles with balance and agility. Model-free reinforcement learning has been used to develop such skills for simulated characters. In particular, a mixture of actor-critic experts (MACE) was recently shown to enable learning of such complex skills by promoting specialization and incorporating human knowledge. However, this approach still requires access to a very large number of training interactions and explorations with a computationally expensive simulator. We demonstrate how to accelerate model-free reinforcement learning to acquire terrain-adaptive locomotion skills, as well as decrease the need for large-scale exploration. We first generalize model-based value expansion (MVE) to a mixture of actor-critic experts, showing the conditions under which the method accelerates learning in this generalized setting. This motivates combining MACE with MVE resulting in the MACE-MVE algorithm. We then propose learning to predict future terrains, character states, rewards, and the probability of falling down via convolutional networks to speed-up learning using generalized MVE. We analyze our approach empirically showing that it can substantially speed-up learning of such challenging skills. Finally, we study the effect of various design choices to control for uncertainty and manage dynamics fidelity.

Supplementary Material: zip

0 Replies

Loading