Keywords: 3D scene object segmentation, unsupervised learning
Abstract: We study the hard problem of 3D object segmentation in complex point clouds
without requiring human labels of 3D scenes for supervision. By relying on the
similarity of pretrained 2D features or external signals such as motion to group 3D
points as objects, existing unsupervised methods are usually limited to identifying
simple objects like cars or their segmented objects are often inferior due to the
lack of objectness in pretrained features. In this paper, we propose a new two-
stage pipeline called GrabS. The core concept of our method is to learn generative
and discriminative object-centric priors as a foundation from object datasets in the
first stage, and then design an embodied agent to learn to discover multiple ob-
jects by querying against the pretrained generative priors in the second stage. We
extensively evaluate our method on two real-world datasets and a newly created
synthetic dataset, demonstrating remarkable segmentation performance, clearly
surpassing all existing unsupervised methods.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 883
Loading