RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

Published: 24 Apr 2024, Last Modified: 24 Apr 2024ICRA 2024 Workshop on 3D Visual Representations for Robot ManipulationEveryoneRevisionsBibTeXCC BY 4.0
Keywords: imitation learning, 3D perception, robot learning
TL;DR: An end-to-end real-world robot imitation baseline through efficient 3D perception.
Abstract: Precise robot manipulations require rich spatial information in imitation learning, which remains a challenge in both 2D and 3D based policies. To tackle this problem, we present RISE, an end-to-end baseline for real-world imitation learning, which predicts continuous actions directly from single-view point clouds. It compresses the point cloud to tokens with a sparse 3D encoder. After adding sparse positional encoding, the tokens are featurized using a transformer. Finally, the features are decoded into robot actions by a diffusion head. Trained with 50 demonstrations for each real-world task, RISE surpasses currently representative 2D and 3D policies by a large margin, showcasing significant advantages in both accuracy and efficiency.
Submission Number: 19
Loading