Keywords: Robotic Manipulation, Policy Learning, Equivariance
TL;DR: We propose the first SE(3)-equivariant policy learning framework that operates with only RGB image observations.
Abstract: Recent work has shown that equivariant policy networks can achieve strong performance on robot manipulation tasks with limited human demonstrations. However, existing equivariant methods typically require structured inputs, such as 3D point clouds or top-down camera views, which prevents their use in low-cost setups or dynamic environments. In this work, we propose the first $\mathrm{SE}(3)$-equivariant policy learning framework that operates with only RGB image observations. The key insight is to treat image-based data as collections of rays that, unlike 2D pixels, transform under 3D roto-translations. Extensive experiments in both simulation with diverse robot configurations and real-world settings demonstrate that our method consistently surpasses strong baselines in both performance and efficiency.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 21227
Loading