Hierarchical Equivariant Policy via Frame Transfer

Haibo Zhao; Dian Wang; Yizhe Zhu; Xupeng Zhu; Owen Lewis Howell; Linfeng Zhao; Yaoyao Qian; Robin Walters; Robert Platt

Hierarchical Equivariant Policy via Frame Transfer

Haibo Zhao, Dian Wang, Yizhe Zhu, Xupeng Zhu, Owen Lewis Howell, Linfeng Zhao, Yaoyao Qian, Robin Walters, Robert Platt

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a equivariant hierarchical policy learning framework for visuomotor policy learning

Abstract: Recent advances in hierarchical policy learning highlight the advantages of decomposing systems into high-level and low-level agents, enabling efficient long-horizon reasoning and precise fine-grained control. However, the interface between these hierarchy levels remains underexplored, and existing hierarchical methods often ignore domain symmetry, resulting in the need for extensive demonstrations to achieve robust performance. To address these issues, we propose Hierarchical Equivariant Policy (HEP), a novel hierarchical policy framework. We propose a frame transfer interface for hierarchical policy learning, which uses the high-level agent's output as a coordinate frame for the low-level agent, providing a strong inductive bias while retaining flexibility. Additionally, we integrate domain symmetries into both levels and theoretically demonstrate the system's overall equivariance. HEP achieves state-of-the-art performance in complex robotic manipulation tasks, demonstrating significant improvements in both simulation and real-world settings.

Lay Summary: Robots can pick up new household or factory skills by watching human demonstrations, but they usually need hundreds of examples because even a tiny change—say, nudging an object a few centimetres—looks completely unfamiliar to them. Our solution is a “two-brain” controller: Big-picture brain: chooses the next general spot for the robot’s hand. Fine-motion brain: works out the precise path to reach that spot. Both brains are trained to recognise when the entire scene has simply been shifted or rotated, so they can recycle the same plan instead of starting over. They communicate through Frame Transfer, which lets the fine-motion brain reason in a local coordinate frame tied to the chosen spot. In simulations and on real robots, this cut the training burden from hundreds of demonstrations to just a few dozen, and let robots finish long, multi-step tasks—like stacking blocks or scrubbing a pot—after seeing only a handful of examples.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://codemasterzhao.github.io/HierEquiPo.github.io/

Primary Area: Applications->Robotics

Keywords: robot learning, imitation learning, robotic manipulation, equivariance

Submission Number: 13739

Loading