Efficient Morphology–Control Co-Design via Stackelberg PPO under Non-Differentiable Leader–Follower Interfaces
Keywords: Morphology–Control Co-Design, Stackelberg Game, Policy Gradient, Proximal Policy Optimization, Non-Differentiable Leader–Follower Interactions, Reinforcement Learning
TL;DR: We propose Stackelberg PPO, a policy gradient method that efficiently co-optimizes robot morphology and control under non-differentiable leader-follower interfaces.
Abstract: Morphology-control co-design concerns the coupled optimization of an agent’s body structure and control policy. A key challenge is that evaluating each candidate morphology requires extensive rollouts to re-optimize control and assess quality, leading to high computational costs and slow convergence. This challenge is compounded by the non-differentiable interaction between morphology and control---stemming from discrete design choices and rollout-based evaluation---which blocks gradient flow across the morphology-control interface and forces reliance on costly rollout-driven optimization. To address these challenges, we highlight that the co-design problem can be formulated as a novel variant of a Stackelberg Markov game, a hierarchical framework where the leader specifies the morphology and the follower adapts the control. Building on this formulation, we propose \emph{Stackelberg Proximal Policy Optimization (Stackelberg PPO)}, a policy gradient method that leverages the intrinsic coupling between leader and follower to reduce repeated control re-optimization and enable more efficient optimization under non-differentiable interfaces. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance.
Primary Area: reinforcement learning
Submission Number: 6938
Loading