RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras

ICLR 2026 Conference Submission21227 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robotic Manipulation, Policy Learning, Equivariance
TL;DR: We propose the first SE(3)-equivariant policy learning framework that operates with only RGB image observations.
Abstract: Recent work has shown that equivariant policy networks can achieve strong performance on robot manipulation tasks with limited human demonstrations. However, existing equivariant methods typically require structured inputs, such as 3D point clouds or top-down camera views, which prevents their use in low-cost setups or dynamic environments. In this work, we propose the first $\mathrm{SE}(3)$-equivariant policy learning framework that operates with only RGB image observations. The key insight is to treat image-based data as collections of rays that, unlike 2D pixels, transform under 3D roto-translations. Extensive experiments in both simulation with diverse robot configurations and real-world settings demonstrate that our method consistently surpasses strong baselines in both performance and efficiency.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 21227
Loading