Abstract: 3D scene segmentation is a crucial task in Computer Vision, with applications in autonomous driving, augmented reality, and robotics. Traditional methods often struggle to provide consistent and accurate segmentation across different viewpoints. To address this, we look at the growing field of novel view synthesis. Methods like NeRF and 3DGS take a set of images and implicitly learn a multi-view consistent representation of the geometry of the scene; the same strategy can be extended to learn a 3D segmentation of the scene that is consistent with the 2D segmentation of an initial training set of input images.
We introduce Contrastive Gaussian Clustering, a novel approach for novel segmentation view synthesis and 3D scene segmentation. We extend 3D Gaussian Splatting to include a learnable 3D feature field, which allows us to cluster the 3D Gaussians into objects. Using a combination of contrastive learning and spatial regularization, our model can be trained on inconsistent 2D segmentation labels, and still learn to generate multi-view consistent masks. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by
over the state of the art.
Loading