Object-Aware Gaussian Splatting for Robotic Manipulation

Published: 24 Apr 2024, Last Modified: 25 May 2024ICRA 2024 Workshop on 3D Visual Representations for Robot ManipulationEveryoneRevisionsBibTeXCC BY 4.0
Keywords: dynamic 3D reconstruction, robotic manipulation
TL;DR: We propose object-aware Gaussian splatting that reconstructs dynamic robotic manipulation scenes in real-time at 30Hz with semantic features.
Abstract: Understanding the dynamics of our world in 3D is critical for the performance and robustness of robotics applications. Although recent progress has married vision foundation models and volumetric rendering to offer semantic 3D representations, neither the inference time of large models nor the update speed of volumetric representation meets the desired update rate of real-time robotic manipulation. In this work, we propose to inject “objectness” into a semantic representation based on 3D Gaussians. The Gaussians with the same semantic labels can initialize and update together, leading to fast updates in response to robot and object movements. All necessary semantic information is extracted at the initial step from pretrained foundation models, thus circumventing the inference bottleneck of large models but still obtaining semantic information. With only three camera views, our proposed representation is able to capture a dynamic scene at 30 Hz in real-time, which is sufficient for most manipulation tasks. Leveraging the representation based on our object-aware Gaussian splatting, we are able to solve language-conditioned dynamic grasping, for which the robot grasps dynamically moving objects specified by open vocabulary queries. We also use the representation to train a visuomotor policy via behavior cloning and show that the policy achieves comparable results with image-based policies with pretrained encoders. Videos at https://object-aware-gaussian.github.io
Submission Number: 6
Loading