Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning

Charles Schaff; David Yunis; Ayan Chakrabarti; Matthew R. Walter

Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning

Charles Schaff, David Yunis, Ayan Chakrabarti, Matthew R. Walter

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: The physical design of a robot and the policy that controls its motion are inherently coupled. However, existing approaches largely ignore this coupling, instead choosing to alternate between separate design and control phases, which requires expert intuition throughout and risks convergence to suboptimal designs. In this work, we propose a method that jointly optimizes over the physical design of a robot and the corresponding control policy in a model-free fashion, without any need for expert supervision. Given an arbitrary robot morphology, our method maintains a distribution over the design parameters and uses reinforcement learning to train a neural network controller. Throughout training, we refine the robot distribution to maximize the expected reward. This results in an assignment to the robot parameters and neural network policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel robot designs and walking gaits for several different morphologies, achieving performance comparable to or better than that of hand-crafted designs.

TL;DR: Use deep reinforcement learning to design the physical attributes of a robot jointly with a control policy.

Keywords: robot locomotion, reinforcement learning, policy gradients, physical design, deep learning

Code: [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=SyfiiMZA-)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/jointly-learning-to-construct-and-control/code)

7 Replies

Loading